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EDITORIAL NOTE 


This is the first issue of the first complete vol- 
ume of the Journal of Educational Psychology “ 
to be published by the American Psychological 
Association and the first issue under a new editor. 
Delays and other matters incident to the’ transi- 
tion of the journal to both a new publisher and a 
new editor have resulted in certain temporary de- 
partures from usual policy and some delay in the 
publication of first issues. Contributors and read- 
ers will note in this and the next several issues, 
for example, certain inconsistencies in style and 
certain departures from standard practice of pub- 
lishing manuscripts in order of their receipt. In 
the future, contributors should follow the style 
prescribed by the Publication Manual of the 

American Psychological Association. 


R. G. K. 
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PATTERNS OF PERSONAL PROBLEMS OF ADOLESCENT GIRLS! 
RICHARD E. SCHUTZ’ 


Teachers College, Columbia Universityu 


Numerous investigators have deter- 
mined the frequency with which various 
problems are listed by given samples of 
adolescents (4). However, in tabulating 
the problems it has been necessary to rely 
on a priori systems of classification. The 
customary procedure has been to cate- 
gorize the problems by activity or func- 
tional areas; e.g., school, home, health, 
ete. 

The purpose of the present study was 
to determine the pattern or structure un- 
derlying the personal problems which 
adolescents recognize and are willing to 
report on a youth problems inventory. 
This pattern was investigated by extract- 
ing homogeneous clusters from a sample 
of 156 items selected from the inventory. 


PROCEDURE 


The inventory used in the study was the 
Billett-Starr Youth Problems Inventory, 
Senior Level (2), a check list intended to 
provide a means of systematically identi- 
fying the personal problems of individual 
adolescents. The 441 items which make up 
the Inventory include problems mentioned 
in the compositions and free responses of 
several large samples of high school stu- 
dents which the authors obtained in de- 
veloping the instrument. The items are 


* Based on a dissertation submitted in 
partial fulfillment of the requirements for 
the Ph.D. at Columbia University. The in- 
vestigation was carried out under the help- 
ful direction of Robert L. Thorndike. The 
writer wishes to thank Roger T. Lennon and 
the staff of the Division of Test Research 
and Service, World Book Co. for making 
the study possible. 


°” Now at Arizona State College, Tempe, 
Arizona. 


organized into 11 areas designated as fol- 


lows: 


1. Physical Health, Fitness, and Safety 

2. Getting Along with Others 

3. Boy-Girl Relationships 

4. Home and Family Life . 

5. Personal Finance 

6. Interests and Activities 

7. School Life 

8. Heredity 

9. Planning for the Future 

10. Mental-Emotional Health and Fit- 
ness 

11. Morality and Religion 


The cluster analysis was based on the 
responses of 500 girls in Grades 10 and 11 
in two Pinellas County, Florida, high 
schools who took the Inventory as part of 
the national standardization program in 
May 1956. The schools are three-year high 
schools and had a segregated white en- 
rollment at the time of the study. The 
Inventory was administered in regular 
classrooms by regular teachers. The In- 
ventories were signed by the students. 

The basic technique of analysis was that 
described by Loevinger, Gleser, and Du- 
bois (6) for deriving clusters which have 
maximum reliability as estimated by 
Kuder-Richardson Formula 20. Each 
cluster is obtained by starting with a triad 
of items having the highest covariance and 
adding items in succession, adding always 
the item for which the ratio of the sum 
of covariance with the items already in 
the cluster is a maximum. Items are added 
to the cluster until no more items remain 
which will increase this ratio. The process 
is repeated on the residual pool of items 
to form the second and subsequent clus- 
ters. 
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A sample of 156 items was selected to 
be included in the cluster analysis (8, pp. 
57-64). The items selected have the fol- 
lowing characteristics: (a) they are in- 
cluded in both the Junior and Senior levels 
of the Inventory; (b) they were rated as 
“very serious” or “moderately serious’ 
problems by a panel of 20 guidance spe- 
cialists; (c) each was marked by at least 
5% of the Ss in the sample. An attempt 
was made to make the sample of items 
representative of the 11 areas of the In- 
ventory and to include as many items as 
possible which have counterparts in other 
published problems check lists. 

The Inventory attempts to get at the 
intensity of a student’s problems by allow- 
ing him to differentiate between those 
which bother him “some” and those which 
bother him “very much.” For the present 
analysis each S’s “some” and “much” re- 
sponses were combined into a single cate- 
gory. 

Each Ss responses were multiple 
punched on an IBM card, and the 156 
by 156 co-occurrence matrix was prepared 
using the counting sorter. The figures in 
the co-occurrence matrix were converted 
to percentages and the variance-covari- 
ance matrix prepared. The cluster analy- 
sis was performed, and a check was made 


on the factorial purity and reliability of 
the obtained clusters. 


REsuLTS 


Three clusters were extracted from the 
pool of 156 items. Eighty-three items are 
included in Cluster I, 16 in Cluster II, and 
17 in Cluster III. An abbreviated Cluster 
I, consisting of 37 items, was formed by 
eliminating 35 items which correlated less 
than .40 with the complete cluster and 11 
items which nearly duplicated another 
item in the cluster; eg., the two items, 
“Pm often restless,” and “I’m restless 
most of the time.” The items in the ab- 
breviated Cluster I and in Clusters II and 
TII are shown in Tables 1-3. The items are 
arranged in order of the magnitude of the 


J 


point biserial correlation of each item 
with its cluster. The area and item num- 
ber within the area are indicated in the 
first column of each table. The Kuder- 
Richardson Formula 20 reliabilities of the 
clusters and their intercorrelations ate 
shown in Table 4. 

The nature of a cluster must be deter- 
mined by examining the items to discover 
the general attribute they seem to hold in _ 
common. The items in Cluster I cover F | 
broad area, coming from eight areas 0 
the Inventory. The cluster appears to re" 
flect a general feeling of personal anxiety — 
and insecurity. | 

The items in Cluster II are currently 
classified under seven different area head- Í 
ings in the Inventory. They seem to 12° — 
volve a feeling of nervous tension concer!” | 
ing relationships with other persons. This 
cluster is the least homogeneous of the 
three, and its correlation with Cluster 
is nearly as high as its reliability. Bot 
Clusters I and II reflect personal anxiety: 
While the items in Cluster II did not hevi 
enough in common with the items in Clu* 
ter I to be included in the more homoge 
neous general cluster, they shared sufficien 
common variance to form another wil 
lower reliability. 5 

Cluster III is the only cluster that do% 
not cut across the functional area organi 
zation of the Inventory to any great €% 
tent. Fifteen of the 17 items in the cluste’ 
come from Area IV of the Inventory 
headed “Home and Family Life.” TP 
items all represent some kind of difficult 
in getting along with parents. i 

If a cluster is factorially pure, all of im 
common factor variance should be 2° 
counted for by a single centroid facto" 
The ratio of the first factor variance 
the common factor variance thus provide f i 
a basis for evaluating the factorial purit’ 
of a cluster. 4 4 

A complete centroid analysis (9) W% | 
performed independently on the items s | 
the abbreviated Cluster I and in Cluster 
II and III. The highest correlation ¢0° q 


— 
— ——— 
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TABLE 1 
CLUSTER I 
=o. 
Item No. r Item 

10-38 59 People don’t understand me. 

10-10 .57 I’m afraid of making mistakes. 

10-15 55 I’m often restless. 

10-43 51 I worry about what others say. 

10-37 51 I feel I’m not wanted. 

10-17 -50 I’m disgusted with myself (dislike myself very much). 
10-19 -49 I need someone to give me advice. 

8-3 49 I would like to be able to do something well. 

8-1 49 I don’t understand myself. 

11-10 49 Many times I don’t know what is right and what is wrong. 
10-39 48 People talk about me behind my back. 

10-2 48 I feel uncertain (unsure) about everything. 

10-34 AT I spend too much time daydreaming. 

10-3 -46 I need to learn to depend on myself. 

10-35 .45 I feel sorry for myself. 
10-46 .45 I get excited too easily. 
11-6 .45 I’m sometimes troubled by immoral (bad) thoughts. 
7-5 -45 I wonder if I’ll pass. 
10-32 44 I don’t get out and go after what I want. 

9-18 44 I wonder if I’m taking the right subjects. 

7-44 44 I’m afraid to take tests. 
10-58 44 T’d like to know how to get rid of a bad habit. 

4-46 44 I’m unhappy at home. 

7-62 -43 Some teachers never encourage or help me. 

6-1 43 I seldom have anything interesting to do. 

7-85 43 I would like to know how to get along with certain teachers. 
8-2 43 I wonder what my real mental ability is. 
10-13 42 I wonder what my future will be. 
10-1 42 I’m confused by the way things change. 

2-50 42 I feel lonely most of the time. 

4-40 42 I’m afraid to tell my (father) (mother) when I’ve done something 

wrong. 

10-42 42 I’m blamed for things that aren’t my fault. 

2-52 41 I find it hard to make friends. 
10-30 Al I don’t know how to (pay attention) (work or study hard). 
11-5 41 I often tell lies. 
10-45 40 I’m bothered by people who find fault with me. 
10-47 40 I can’t control my temper. 


cient in each column of the matrix was 
used as the communality estimate, com- 
munalities being re-estimated by this 
method for every residual matrix. Factor- 
ing was considered complete when both 
Tucker’s Phi and Coombs’ criterion (3) 
indicated that the last significant factor 
had been extracted. Three factors were 
extracted from Cluster I, three from Clus- 
ter II, and two from Cluster III. First 
factors account for ‘80%, 69%, and 85%, 


a 


respectively, of the common factor vari- 
ance of the clusters. 

To investigate the extent to which the 
reliability of the clusters is dependent on 
chance factors, the Kuder-Richardson 
Formula 20 reliability of each cluster was 
computed, based on the results of a new 
sample of 73 Ss selected from the same 


© population. as the original sample, The re- 


liability’ ‘codfitients for the new sample 
were as follows: Cluster I, .94;- abbrevi- 


4 RICHARD E. SCHUTZ 


TABLE 2 
CLUSTER II 
Item No. r Item 
2-11 -63 I’m nervous when I talk to people. 
2-15 -60 I’m not good at talking with people. 
7-38 55 I’m nervous in front of the class. 
1-19 49 I get tired easily. 
2-10 .48 (I’m afraid) (I don’t like) to meet people. 
2-39 AT I want others to like me. 
1-21 .44 I’m always nervous. 
3-1 44 I don’t understand (boys) (girls). 
2-6 -38 I’m not good-looking. 
6-10 36 I would rather be alone. 
6-9 -36 I get tired from too much activity. 
6-11 .36 I spend too much time on (radio) (television) (movies). 
1-22 382, I need to know more about sex (body changes at my age) (new 
body functions). 
10-40 31 I’m afraid I seem conceited (stuck-up). 
9-25 -30 I’m not sure whether I should go to college. 
2-7 28 I (don’t have) (don’t know how to pick) the right clothes. 
] TABLE 3 
Cuuster III 
Item No. r Item 
4-31 64 My (father) (mother) is always criticizing (blaming) (nagging) 
me. 
4-43, 59 I can’t discuss things with my (father) (mother). 
4-33 58 My (father) (mother) is always expecting too much of me. 
4-37 54 i se the cause of family quarrels (parents argue about things 
0). 
4-51 -52 Tm thinking of leaving home. 
4-41 -52 My (father) (mother) has little or no interest in what I do. 
4-39 51 I sometimes lie to my (father) (mother) to get permission to do 
something. 
ae ec My Hester) (mother) is often nervous and irritable. 
x y (father) (mother) never asks my opinion about ing im- 
portant to the family. anything ini 
res 49 F dislike my (father) (mother) very much. 
-49 don’t agree with my (father) (mother) abo E 7 
activities. ) we gator school 
4-48 48 I’m sometimes ashamed of things my parents do or 
4-38 45 My (father) (mother) pries into my private affairs. aye 
4-26 44 My (brother) (sister) is always causing me trouble 
5-4 43 I don’t get an allowance. A 
2-19 -40 I often “stretch the truth” when I tell s i 
4-49 - +34 There’s too much drinking =e 


in our home. 


ated Cluster I, .91; Cluster II, 63; Clus- and the original group was computed us- 

ter III, 83. s ing Fisher’s z transformation. aea 
The significance of the difference be- ference for Cluster II is significant at th 

tween each of the coefficients for the new 001 level. There is no significant REA 


O AE T T ee 
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TABLE 4 
CLUSTER INTrERCORRELATIONS AND KUDER- 
RICHARDSON RELIABILITIES 


Cluster | 1(83) | 1(37) | IF | III 
I (83) | (.94) | .98 | .67 | .60 
I (37) -98 | (.90) | .63 | .52 
Il .67 .63 | (.71) | .34 
II -60 -52 -34 | (.81) 


between the coefficients for the other 
clusters. 


Discussion or RESULTS 


The findings of the study have implica- 
tions for both the theory and measure- 
ment of adolescent problems. While the 
study was not designed to test the validity 
of the various theories of adolescent prob- 
lems that have been proposed (1, 5, 7), 
it does provide evidence concerning the 
way in which problems cluster or “go to- 
gether.” 

The cluster structure does not corre- 
spond closely to any of the theoretical 
frameworks which have been proposed for 
classifying adolescent problems. The com- 
position of Cluster I suggests that a single 
dimension of personal anxiety underlies 
many of the manifest problems which 
theorists have used several dimensions to 
explain. 

The fact that two of the clusters cut 
across several areas of the Inventory in- 
dicates that the classifying of items in 
Problems check lists into the traditional 
functional or activity categories is in large 
part an arbitrary procedure. However, 
even though the conventional rubrics do 
not all represent true functional unities, 
teachers and counselors may still find the 
categories helpful. The subdivisions often 
Suggest programs of action related to the 
kinds of services that schools and other 
Social agencies are equipped to provide. 
Thus, the traditional area organization 
often serves the purpose of suggesting foci 
of therapeutic action. 

Although no data are yet available con- 


cerning the predictive validity of the clus- 
ters, they provide a potential means of 
seréening adolescents in need of psycho- 
logical help. They have high face validity 
and are reasonably pure factorially. A 
study of predictive validity is required to 
determine the extent to which they are 
actually related to personal adjustment. 


SUMMARY 


A cluster analysis was performed on 156 
selected items from the Bilett-Starr 
Youth Problems Inventory, Senior Level, 
based on the responses of 500 adolescent 
girls. Three clusters were extracted and 
designated as follows: Cluster I—General 
personal anxiety and insecurity; Cluster 
Ii—Tension concerning relations with 
others; and Cluster III—Diffculties in 
getting along with parents. The implica- 
tions of the cluster structure for the the- 
ory and measurement of adolescent prob- 
lems was briefly discussed. 
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READABILITY LEVEL AND DIFFERENTIAL TEST 
PERFORMANCE: A LANGUAGE REVISION 
OF THE STUDY OF VALUES 
JEROME LEVY’ 


Denver University 


Psychologists have recently become in- 
terested in the “readability” of material; 
that is, how the verbal difficulty of written 
material affects the communication of 
ideas. Hebb and Bindra (5), Ogdon (7), 
and Stevens and Stone (8) have pub- 
lished material relevant to this question. 
These writers have been concerned with 
readability in discursive writings; eg., 
textbooks. The important question, how- 
ever, of how the readability level of a 
psychological test might affect the score 
which a subject with a defined level of 
verbal competence achieves has not 
hitherto been directly investigated. It ap- 
pears to the writer that this is a question 
of considerable importance to sound psy- 
chometric practice. The present study was 
designed to investigate this question. 

The Allport-Vernon Study of Values, 
1951 Revision (1), was selected as an ap- 
propriate instrument. Since its introduc- 
tion in 1931 and following its revision in 
1951, the Study of Values has been em- 
ployed in researches in personality theory, 
and for clinical evaluation and guidance 
purposes, and has proved quite useful in 
these areas. Many writers have noted, 
however, that the level of language em- 
ployed in the Study of Values is quite 
complex and difficult. Gough (4), for ex- 
ample, writes: “The language used is too 
academic and involved for use in groups 
very far removed from a scholastic en- 
vironment.” Allport notes: “The scale is 
designed primarily for use with college 
students or with adults who have had col- 

1Now Chief Psychologist, Division of 


Mental Health, Colorado State Department 
of Public Health. 


lege (or equivalent) education” (1). Al- 
though it is not explicitly stated, one sur- 
mises that the basis of this view is the 
recognition of the verbal complexity of the 
test, particularly in terms of vocabulary 
usage. If, however, verbal difficulty is the 
limiting factor as seems suggested, then 
the Study of Values may not be a valid 
test even for all college students. Classes 
in remedial reading, basic communication, 
and other related areas in every university 
attest to the large numbers of students 
deficient in these verbal skills. 

The present study represents an at- 
tempt to produce a modification of the 
Study of Values congruent in all respects 
with the 1951 Revision, except that it em- 
ploys less difficult language. Such a modi- 
fication may be useful in two ways: if it 
can be demonstrated equivalent to the 
1951 Revision it may be validly used with 
Ss for whom the 1951 Revision is valid, 
and may also permit extension of the test 
to populations hitherto considered inap- 
propriate (i.e., noncollege level Ss). Sec- 
ondly, it may be utilized in demonstrating 
the effects of differential vocabulary abil- 
ity on test scores. 


PROBLEM 


There are four major phases to this re- 
search: (a) actual construction of the 
modification; (b) a test of the meaning 
equivalence of the items of the modifica- 
tion and the 1951 Revision; (c) a demon- 
stration that the language level of the 
modification is indeed lower than that of 
the 1951 Revision; and (e) a demonstra- 
tion that there will be differences in per- 
formance between the two forms for Ss 
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whose vocabulary ability is below that in- 
herent in the 1951 Revision which will be 
significantly greater than the between- 
form differences for Ss whose vocabulary 
ability is equal to that of the 1951 Revi- 
sion. This is the major focus of the in- 
vestigation and the primary hypothesis to 
be tested. Consequent hypotheses for test- 
ing are: (a) For a high vocabulary ability 
group the between-form performance dif- 
ferences will be of a minimal nonsignifi- 
cant nature; (b) For a low vocabulary 
ability group there will be a large and 
statistically significant difference in per- 
formance on the two forms; and (c) For 
a group of Ss intermediate on the vocabu- 
lary criterion, the between-form differ- 
ences will fall between that for the high 
and that for the low groups. 


MetHop 


Construction of the modification. Of the 
45 items in the 1951 Revision, all but one 
were reworded. Items were constructed 
which attempted to maintain the struc- 
ture and meaning of the equivalent origi- 
nal items, but which were worded much 
more simply. Where a choice of words 
with equivalent meanings was to be made, 
the simpler word was always chosen. 

Equivalence of meaning. Both the 1951 
Revision and the modified form were given 
to a group of judges with a rather hetero- 
geneous background in psychology. These 
Judges were unfamiliar with the Study of 
Values. They were supplied with the defini- 
tions of the six predominant personality 
types as they appear in the Manual of 
Directions (1) so that they would have 
clear referents for each scale. They were 
Presented the two forms in mixed order 
and asked to judge which type of person 
Would answer each question with each al- 
ternative choice. For example, in the item 
“Assuming you have sufficient ability, 
Would you prefer to be: (a) a banker; (b) 
a politician?” which of Allport’s types 
would choose alternative (a), which would 


choose (b)? The judges also matched the 
items of both forms by number; the items 
of each form having been randomized. 

Readability. In order to evaluate the 
hypothesis that the language level of the 
modified form was simpler, a test of 
“readability” was employed. This is the 
Flesch formula (3). 

Between-form differences. A group of 
157 young male Air Force Ss was ad- 
ministered the 1951 Revision, the present 
modification, and the Diagnostic Reading 
Test, Survey Section (2). The two forms 
of the Study of Values were given in bal- 
anced order of administration, with the 
Diagnostic Reading Test coming between 
them for all Ss. The vocabulary scale of 
the Survey Section was taken as the pri- 
mary criteria for the selection of three 
groups: a Low group of 22 Ss whose vo- 
eabulary scores placed them at or below 
the lower quartile of the eighth grade, a 
High group of 23 Ss with vocabulary 
scores at or above the upper quartile of 
the twelfth grade, and a Middle group of 
20 Ss with intermediate vocabulary scores. 
Thus there resulted three groups of ap- 
proximately equal size, with no overlap on 
the vocabulary criterion, whose tested vo- 
eabulary ability closely corresponded to 
the readability of the two forms of the 
Study of Values. The two forms were 
scored for each S, and a “deviation score” 
was also computed. This is merely the 
difference between S’s score on a scale on 
one form and his score on the same scale 
on the alternate form, either positive or 
negative (sign being dropped), and 
summed for all six scales. 


Resvits 


The results of the judgments indicate 
that, in the opinion of these judges, the 
revised items do ask the same thing as 
the original items. The judges made the 
same categorizations of the original and 
the revised items; i.e., there was not a 
statistically significant difference between 
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TABLE 1 
Summary or £ TESTS OF OrpER FOR THE HIGH, MIDDLE AND Low GRrouPs 
Theoretical Economic Aesthetic Social | Political Religious 

Low Group 

te 

ll 1.12 65 -08 -29 .07 

Middle Group 

tb 

98 1.07 m5 44 80 1.42 
High Group 

te 

64 26 52 58 74 -59 


a 05 = 2.086, df = 20 
b 05 = 2.101, df = 18 
© 05 = 2.080, df = 21 


the type judgments on the pairs of items. 
The results of the item-to-item match 
show that, of the total of 506 matches the 
judges made, 495—or 98%—were matched 
correctly. 

The Flesch formula yielded a “Reading 
Ease Svore” of 52.7 for the 1951 Revision, 
placing it at the twelfth grade level: the 
score of 72.6 for the modified form is at 
the seventh grade level. 


TABLE 2 
ANALYSIS OF VARIANCE TO DETERMINE 
SIGNIFICANCE oF BETWEEN-Form DIF- 
FERENCES FOR THE Low GROUP, AS 
COMPARED TO BretwEEN-Form DIr- 
FERENCES FOR THE Hicu Group 


Sum of df Mean r; 
Squares Square 
Between 609.158| 2 |304.579| 3.955 
Within 4,774.780| 62 | 77.103 
Total 5,383.938| 64 
Low Middle High 
Group | Group | Group 
Mean Differ- | 24.182 | 22.150 | 17.043 
ences 


Since some of the Ss in each of the 
three groups took the forms in original- 
modified order, and others took them in 
modified-original order, ¢ tests were com- 
puted to determine whether the order in 
which the forms were taken has a signi- 
ficant effect on the between-form differ- 
ences. As may be seen from Table 1, none 
of these t’s was significant at the .05 
criterion level. It was therefore possible 
to combine all of the Ss within each of 
the three groups, and to treat each of the 
groups as a single unit for further statis- 
tical evaluation. 

As a test of the main hypothesis—that 
is, that the Low group would show sig- 
nificantly greater between-form differ- 
ences than would the High group—an 
analysis of variance was performed uti- 
lizing the deviation scores. The F value 
of 3.955 which was obtained, as shown 
in Table 2, is significant at, and beyond, 
the .05 level of confidence. This clearly 
substantiates the major hypothesis. The 
primary characteristic on which these two 
groups differ is that of diagnosed vocab- 
ulary ability, and the significantly 
greater between-form differences in the 
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Low group seems clearly related to their 
low vocabulary level. The group means 
of the difference scores, which are also 
contained in Table 2, bear out the hypo- 
thesis that as the more difficult form 
moves away from the Ss’ level of vocab- 
ulary ability they make increasingly 
different scores than they do on the form 
which is within their level of competence. 

The groups show a good deal of con- 
sistency in mean scale values from one 
form to another, as indicated by the data 
in Table 3. On the Theoretical scale, the 


Low group scores lowest on both forms, 
the Middle group is intermediate on both 
forms, and the High group scores highest 
on both forms. The Aesthetic scale shows 
the reverse order, with the same relative 
consistency on both forms. The ¢ tests were 
computed to determine the significance 
of change in performance on each of the 
six scales for the High and Low groups: 
these data are presented in Table 4. 
The hypotheses employed in this in- 
vestigation state, in effect, that the two 
forms of the Study of Values are parallel 


TABLE 3 
Mean SCALE VALUES AcHIEVED BY THE Low, MIDDLE anp Hiau Groups 
ON THE 1951 REVISION AND THE MODIFICATION 


Group Theoretical Economic Aesthetic Social Political Religious 
Low 
1951 Revision 
Mean i 43.136 40.272 34.181 40.454 42.772 39.181 
Modified Mean 40.136 42.363 33.954 41.045 42.454 40.045 
Middle 
1951 isi 
Ma rea 44.500 43.550 32.350 35.250 43.350 41.000 
Modified Mean 44.950 42.150 33.600 36.850 41.300 41.150 
High 
1951 isi 
ie on 49.347 44.826 30.913 33.173 40.260 41.378 
Modified Mean 47.739 44.304 31.478 35.478 40.260 40.739 
TABLE 4 
t TESTS OF SIGNIFICANCE OF DIFFERENCES IN PERFORMANCE BETWEEN 
1951 REVISION AND MODIFICATION ON EACH OF THE 
Sıx Scares ror Hicu anp Low Grovurs 
Theoretical Economic Aesthetic Social Political Religious 
Low Group 
2.29 .37 1.36 1.13 1.77 
p i 05 NS NS NS NS 
High Group 
t 1.47 1.18 1.55 2.62 46 1.97 
P NS NS NS .02 NS NS 
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TABLE 5 
Trst-RETEST CORRELATION COEFFICIENTS FOR THE 1951 Revision AND OBTAINED 
CORRELATION COEFFICIENTS BESWEEN PERFORMANCE ON THE 1951 REVISION 
AND THE MODIFIED Form FOR THE HicH AnD Low Groups 


1951 Revision 
(Test-Retest) 


Low Group correla- 
tions between 1951 
Revision and Modified 


High Group correla- 
tions between 1951 
Revision and Modified 


Theoretical .87 
Economic -92 
Aesthetic -90 
Social -17 
Political -90 
Religious 91 


Tests of Significance Between Correlation Coefficients 


High to Low 
Critical Ratio 


High to 1951 Revision 
Critical Ratio 


Low to 1951 Revision 
Critical Ratio 


Theoretical 1.27 1.35 2.685 
Economic 2.65" 1.47 4.189 
Aesthetic 3.276 0.00 3.26> 
Social 51 76 oud 
Political 3.185 .30 3.57% 
Religious 2.60" 0.00 2.59" 

® 01 level 

b .001 level 


© 0001 level 


forms for the High vocabulary group 
where there are not the interfering effects 
of vocabulary limitations, and are not 
parallel forms for the Low group where 
these limitations do exist. If the two forms 
are parallel, the correlation between per- 
formance on the Modification and on the 
1951 Revision should not differ signifi- 
cantly from a test-retest correlation of 
performance on the 1951 Revision alone. 
In other words, if the two forms are 
equivalent there should not be a statis- 
tically significant difference between the 
correlation coefficients obtained by giving 
the 1951 Revision twice, and those ob- 
tained by correlating performance on the 
two forms. Pearson product-moment cor- 
relation coefficients were computed to 
determine the degree of relationship of 
the two forms in the High and Low 
groups. To determine whether these 7’s 
are significantly different from each other 


the r-to-z transformation and critical ratio 
test of significance was employed. These 
data are presented in Table 5. Because 
of the impossibility of obtaining test-retest 
data on this group of Ss, these data are 
taken from the 1951 Revision Manual 


(1). 
Discussion 


It seems reasonable to expect that 
people who are highly theoretically ori- 
ented are unlikely to have a very low 
vocabulary level. Omitting any question 
of cause and effect, it does seem likely 
that there is a positive relationship be- 
tween these areas. People who are theo- 
retically oriented are likely to spend a 
good deal of time reading in many areas 
of knowledge, and in other ways are apt 
to develop a well-organized and compre- 
hensive vocabulary. Conversely, persons 
not theoretically oriented may reasonably 
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», be expected to engage less in these activ- 


ities. What the writer proposes, in effect, 
is that the higher mean Theoretical score 
found on the original form is spuriously 
high. When items are not understood, 
choices are likely to be made on the basis 
of something other than true preference, 
and it is suggested that the higher mean 
for this group on the original form is due 
largely to noncomprehension. When, on 
the other hand, they are presented the 
same alternative choices in language that 
they can understand, they tend to score 
lower. Since as a group they are not highly 
theoretically oriented, the group mean 
Score on this scale will fall as the meaning 
of the items become clear. If this view 
1S correct, then the modified form is in- 
deed a more valid measure of theoretical 
values in this group. 

The lower Economic scale scores for the 
Low group on the 1951 Revision is sig- 
nificant at the .05 level. It is suggested 
that this result again reflects the effects of 
honcomprehension. It is hypothesized that 
when choices having to do with economic 
values are presented to Ss in the Low 
group in language that is meaningful to 
them, they score higher on the Economic 
Scale because this reflects a “true” under- 
lying orientation in this direction. 

_The results shown in Table 4 also in- 
dicate that there is one scale on which the 

igh group Ss make significantly different 
Scores on the two forms: the Social scale. 
Two tentative explanations may be ad- 
vanced for the significantly higher score 
on the modified than on the original form: 
(a) this may be a chance phenomenon, 
or (b) real differences may arise on this 
Seale in a highly verbal group as & result 
Of change in wording. A cross-validational 
Study to investigate the meaning of this 
Tesult is necessary. 

Interpretation of the results of this 
Study seems simple and straightforward. 
Mee none of the between form correla- 
tions for the High group differs signifi- 
cantly from the test-retest correlation 
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coefficients, the modified form is con- 
sidered a parallel form for this group. 
For those Ss where there are not the 
vocabulary limitations, the modification 
yields results that do not differ signifi- 
cantly from that which would be obtained 
through the use of the 1951 Revision, 
and the two forms may be considered 
equivalent for this group. However, the 
significant CRs obtained in the com- 
parison for the Low group indicate that 
the modification is not a parallel form 
for this group. Since the major char- 
acteristic on which the groups differ, and 
the basis on which they were selected, 
is differential vocabulary ability, it seems 
clearly evident that it is the vocabulary 
proficiency of the Ss which determines 
differential performance on the two forms. 
If the Ss can understand the language, 
then the modification is a parallel form 
of the Study of Values. If, however, the 
Ss’ language ability is not sufficient to 
cope with the 1951 Revision, the differ- 
ences in performance on the two forms 
are so great as to indicate that they 
represent different tasks. 

Since the modification is a parallel form 
for highly verbal Ss, it is likely that it 
is as valid for these Ss as is the 1951 
Revision. It is the writer’s opinion that 
the modification is a more valid form for 
low verbal people. Since the level of 
language is brought within the compre- 
hension of these low verbal Ss, random 
response choices based on noncompre- 
hension are likely to be decreased. Re- 
sponses based on an understanding of 
what the question is really asking are 
more likely to reflect truly the S’s ori- 
entation on the dimension being measured. 
External validation studies with different 
occupational groups, with other test 
criteria, and with generalized measures 
of personality are necessary for an em- 
pirical demonstration of validity. If on 
further investigation this modification re- 
ceives external validity support, it may 
well serve a useful function in psycho- 
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logical evaluation. It will provide the 
same sort of personality information as 
does the 1951 Revision for highly verbal 
Ss and will permit extension of the test 
to populations hitherto considered inap- 
propriate. . 
The factor of verbal level as an im- 
portant variable in pencil-and-paper psy- 
chological test results has, of course, been 
realized for some time. But this has been 
at a rather gross level. This is the first 
study, to this writers knowledge, which 
has attempted to manipulate verbal fa- 
cility as an experimental variable and to 
evaluate statistically the effect which this 
has on test performance. With the grow- 
ing concern about readability and com- 
munication in general, it is not likely to 
be the last. A good deal more study re- 
mains to be done with the modification. 
An item analysis is planned for the future. 
A covariance analysis of the relationship 
between differential test performance, ver- 
bal level, and intelligence is likely to 
clarify other important variables. But a 
promising start has been made in pro- 
ducing a meaningfully equivalent, yet less 
verbally complex, form of this popular 
test, and it has been possible to demon- 
strate the significant role which verbal 
facility plays in pencil-and-paper test re- 
sults. 


Summary 


This paper reports an attempt to pro- 
duce a meaningfully equivalent, but less 
verbally complex, form of the Study of 
Values, and to demonstrate the significant 
role which language facility plays in de- 
termining score patterns on pencil-and- 
paper test of personality. Pilot data with 
the modified form indicates that it is 
judged as asking the same things as the 
Study of Values. Flesch counts of the 
1951 Revision and of this modification 
indicate that they are at the twelfth 
grade level and seventh grade level, re- 
spectively. Three experimental groups 
were established: a Low vocabulary 


group, a High vocabulary group, and a 
Middle group. The two forms were ad- 
ministered to these Ss. Analysis of the 
data indicates that for the High group 
the modification is an equivalent form. 
The Ss in the Low group make signifi- 
cantly different scores on the two forms, 
and the forms are not equivalent for 
this group. The differences in performance 
in this group are attributed to a vocab- 
ulary level inadequate to deal with the 
1951 Revision, and it is suggested that 
for these Ss the modification provides a 
more valid test of value orientations. 
Much additional exploratory work re- 
mains to be done with the modification 
before it may be completely accepted as 
a valid, equivalent form for all Ss, This 
study serves to emphasize the important 
role of language facility in all pencil-and- 
paper personality tests. 
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THE ATTITUDES STUDENTS ASSIGN TO THEIR TEACHER 


NORMAN M. CBANSKY 
Oswego State Teachers College 


How students see and judge their teach- 
ers are behavioral operations which have 
more than theoretical importance. The 
validity of student judgments has never 
been determined, yet research studies in 
this area tacitly assume the factor of 
validity. That different students viewing 
the same instructor assign him varied, 
even contradictory, attitudes warrants in- 
quiry into the factors associated with such 
judging. The purpose of the present study 
was to examine whether attitudes assigned 
to a teacher can in any way differentiate 
between students with authoritarian out- 
looks and those with democratic outlooks. 


PROCEDURE 


After lecturing for three weeks about 
prenatal development, postnatal develop- 
ment, and theories of child development, 
the writer administered the Minnesota 
Teacher Attitude Inventory (MTAI) (3) 
to his classes in Child Psychology. The 
MTAI is a measure of attitude toward 
democratic attitudes and practices in teach- 
ing, based on the F Seale of the authori- 
tarian personality series of studies (1). 
High scores on the MTAI indicate a demo- 
cratic attitude; low scores, an authori- 
tarian attitude. 

After haying answered the items of the 
MTAI, the students were asked to write 
on the back of their answer sheets what 
attitudes toward children they thought 
their instructor held. Care had been taken 
during the three introductory weeks of 
te course to avoid controversial issues in 
child psychology. At no time did the in- 
Structor consciously cue the students to 
his point of view. 

During the subsequent 12 weeks of the 
Course, lectures and course activities cen- 
tered on (a) the law of effect as related 
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to the development of behavior in chil- 
dren, (b) factors associated with accept- 
ance of children, and (c) the childhood 
antecedents of self realization. 

At the end of the course, the MTAT 
was readministered and the students were 
asked once again to state the attitude 
toward children their instructor holds. 


RESULTS 


A content analysis of the first set of 
student statements of the attitude which 
the instructor was presumed to hold 
yielded seven distinct categories. These 
categories were: 

(1) freedom of children to manipulate 
the environment 

(2) development of socially precise be- 
haviors in children with punishment for 
deviants (discipline) 

(3) a cold, impersonal attitude toward 
children 

(4) the development of independence in 
children 

(5) respect for children 

(6) warm, friendly, personal relation- 
ship with children 

(7) helping children with learning and 
emotional problems (clinical). Some atti- 
tudes could not be classified and were not 
included in the results. 

The specific statistical tests (See Table 
1) indicate that those students who felt 
their teacher would encourage freedom in 
children received significantly higher 
MTAI scores than any other group. On the 
other hand, those students who saw their 
instructor taking a clinical attitude toward 
children received significantly lower MTAI 
scores than most of the other groups. 
Their scores were not reliably different 
from either the discipline or the imper- 
sonal groups. In addition, the group who 
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TABLE 1 
t RATIOS FOR THE HYPOTHECATED [INSTRUCTOR ATTITUDES 
BEFORE EXPOSURE TO TEACHER ATTITUDE 


" ed Mean t Ratios for differences between attitude groups 
nstructor Atti- |x: 
tude Naber Le Disci- | Imper- |Independ-| Re- P ii 
pline sonal ence spect ersonal |Clinical 
Freedom 16 53.25 4.83» 2.198 2.39" | 2.274 2.219 | 4.33 
Discipline 12 15.42 —1.81 | —5.02° | 1.86 | —3.45> | 0.78 
Impersonal 9 31.77 0.77 | 0.02 —0.70 | 1.70 
Independence 12 38.04 1.04 0.12 | 3.50° 
Respect 12 31.50 —0.75 | 3.59° 
Personal 6 37.50 3.15% 
Clinical 9 14.66 


Note.—degrees of freedom are Ni + Na — 2. 
a Significant at P .05 level. 
b Significant at P .01 level. 
€ Significant at P .001 level. 


looked upon their instructor as a discipli- 
narian received significantly lower MTAI 
scores than the freedom, independence, 
and personal groups. 

These statistical tests indicate that, in 
the absence of well-defined cues, students 
projected their own attitudes toward the 
instructor (2). The reasoning behind this 
interpretation was as follows. Since cues 
to the attitude of the teacher toward chil- 
dren were minimized, the necessary ambi- 
guity for projection was present. Further- 
more, when those students who assigned 
such attitudes as freedom, independence, 
respect, usually associated with democ- 
racy, received higher MTAI scores, in- 
dicative of a democratic attitude, and 
when those students who perceived their 
instructor as either a disciplinarian or a 
clinician received lower MTAT scores, in- 
dicative of an authoritarian attitude, the 
interpretation of projection was confirmed. 
The relationship between discipline and 
authoritarianism is not unexpected, but 
why a service attitude such as that which 
a clinician holds received such low MTAI 
scores can be reconciled only if the clini- 
cal attitude represents an expression of 
the deep dependency needs of the authori- 
tarian personality. If this hypothesis be 


substantiated in subsequent studies, it may 
be worthwhile investigating whether this 
clinical attitude means sensitivity to the 
needs of children or if it is a subtle way 
of expressing the wish for an anaclitie re- 
lationship with an authority figure. 

Data obtained after exposure to the 
teacher’s attitude are reported in Table II. 
Four new categories appear in the content 
analysis of the student assigned attitudes 


.of the teacher. These are patience, school 


learning, no punishment, and understand- 
ing. A chi-square test was made to de- 
termine whether students assigned dif- 
ferent attitudes to their instructor after 
he had presented his point of view. The 
data yielded the non-significant chi-square 
or 10.91 (10 df). Any change that took 
place can be attributed to chance. A 
coefficient of contingency of .46 was also 
obtained. These data further suggest that 
there was some consistency in respond- 
ing to the teacher-attitude question on the 
two testings even after exposure to his 
attitude. 

The results of the retest suggest there 
was greater variability than there had 
been in the first testing. While the freedom 
group had received significantly higher 
MTAI scores than any other group during 
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TABLE 2 
¿ RATIOS FOR THE HYPOTHECATED INSTRUCTOR ATTITUDES 
ÅFTER EXPOSURE TO TEAGHER ATTITUDE 


Mean t Ratios for differences between attitude groups 
“Attitude N MTAL Tr N Disci Independ-| Und 
Attitude - earn-| == o pun- ISCl= epend-| nder- 
SCORE | ace ing |Clnical Respect ERE pline [Personal TE standing 

Freedom |20| 56.00) 0.95| 2.399 3.40% 1.05) 1.16 |6.17° 0 0.05 1.11 
Patience | 5| 46.60 1.24 1.53 | —0.16| —0.27 |3.44 | —0.96 | —0.65 | —0.12 
Learning | 8| 29.50) —0.11 | —1.25] —1.52 |1.50 | —1.89 | —1.62 | —1.40 
Clinical | 6| 31.00) —1.90} —2.33*/2.41° | —3.04) —1.72 | —2.02 
Respect |12| 47.77 —0.07 |4.19° | —0.91 | —0.60 | 0 
No pun- | 7| 48.71 5.09° | —0.94 | —0.54 | —0.08 

ish- 

ment 
Disci- 16| 10.00 —5.43°| —3.46b| —4.37> 

pline 
Personal |14| 55.92 0 0.99 
Inde- 5| 55.40 0.57 

pend- 

ence 
Under- |13| 48.00 

stand- 

ing 


Note.—Degrecs of freedom are Ni + Nz — 2. 
° Significant at P .05 level. 

Significant at P .01 level. 
° Significant at P .001 level. 


the pretest, the retest results indicate that 
the freedom group can only be distin- 
guished from the emphasis on school learn- 
ng, clinical, and the discipline groups. 
The emphasis on school learning is & 
characteristic of the authoritarian per- 
Sonality according to the authors of the 
MTAI. The results of the retest indicate 
further that these three authoritarian 
groups received significantly lower MTAT 
Scores than many of the democratic at- 
titude groups. The clinical group attained 
Significantly lower scores than the freedom, 
no punishment, and personal contact 
groups; the school learning received lower 
Scores than the freedom group; and the 
discipline group received significantly 
lower scores than all groups except the 
School learning. It is interesting to note 
that the clinical group was more demo- 
cratic in attitude than the discipline group- 

Again, students assigning democratic 


attitudes to the teacher received higher 
MTAI scores, indicative of.a democratic 
attitude; students assigning antidemo- 
cratic attitudes, received lower MTAI 
score, indicative of an antidemocratic at- 
titude. 


SUMMARY AND CONCLUSIONS 


The purpose of this study was to de- 
termine whether students’ authoritarian 
or democratic attitudes toward children 
were in any way associated with their 
perception of their teacher. 

Not having been cued to their teacher's 
attitude during the first phase of the ex- 
periment, the differences between those 
who saw their instructor as one who would 
encourage freedom, understand, or re- 
spect children and between those who saw 
him as a disciplinarian or helper of the 
helpless was interpreted in light of the 
projective hypothesis. Students, without 
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their awareness, assigned attitudes which 
they themselves held. 

After having been continuously cuéd to 
the teacher’s attitude thereafter, the ear- 
lier vagueness disappeared experimentally, 
making projection less tenable as an expla- 
nation, but not ruling it our entirely. The 
writer believes the students assigning the 
different attitudes to their instructor oper- 
ated within different frames of reference. 
In this study, two frames of reference were 
possible: a democratic and an authori- 
tarian. Within a democratic frame of ref- 
erence, students selectively perceived in 
the teacher evidence for freedom of, pa- 
tience with, understanding of, respect of, 
and no punishment of children. Within 
an authoritarian frame of reference, stu- 
dents saw their instructors as being pri- 
marily interested in discipline, school 
learning, and children with psychological 
problems. 

Certain attitudes which students as- 


s 


& 
sign to their instructor, then, differentiate š 


between democratic and authoritarian stu- 
dents. In addition, there is evidence that 
rating of attitudes in another person will 
be influenced to a great degree by the at- 
titude the rater himself holds. Hence, there 
is a constant error in the judgment of 
democratie attitudes in others. Demo- 
cratic raters are apt to give more demo- 
cratic ratings; authoritarian raters are apt 
to give more authoritarian ratings. 
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THE SCHOOL PROGRESS AND ADJUSTMENT OF 
UNDERAGE AND OVERAGE STUDENTS! 


CLYDE J. BAER 
Kansas City, Missouri, Public Schools 


i The purpose of this study was to inves- 

tigate the question of whether or not a 
child who begins school underage experi- 
ences similar problems and achieves the 
same level of development as if he had 
waited a year to enter school. 

In this city school system a chrono- 
logical age of five by November first is 
required for regular entrance into the 
kindergarten. However, children who will 
be five during November or December 
and who score a mental age of 5-0 or 
above on an individual intelligence test 
may be admitted. This study was made 
because some administrators and teachers 
in this school system feel that many of 
those children who enter prior to age five 
are too immature to be in school. How- 
ever, there is no mid-year entrance oF 
Promotion and children who are not ad- 
mitted must wait a full year to enter 
School, 


PROCEDURE 


p Seventy-three children with birthdates 
in November and December were matched 
mn 73 children with birthdates in Janu- 
Si 4 and February who were in the same 
chool grade and who had entered kinder- 
Barten in September of the same year. The 
rt were matched on the bases of in- 
clligence quotient, sex, and, in about two 
rds of the cases, the school entered. 
es their eleventh year in school 
e groups were compared on the bases 
of physical size at the time of the study, 
8rade level attained, number of problems 


1 x 
eae article is based on the writer's doc- 
dissertation, the University of Kansas, 
enr Grateful acknowledgement is made to 
work P. Smith, under whose direction this 
was done. 
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marked on the Science Research Associ- 
ates Youth Inventory and scores on the 
Guilford-Zimmerman Temperament Sur- 
vey. From the cumulative records, com- 
parisons were made of marks in elemen- 
tary and high school subjects, achievement 
test scores, teacher rating on personal 
traits, and number of absences. 


RESULTS 


Table 1 describes the groups studied. 

The overage and underage students were 
not significantly’ different in intelligence, 
as measured by the Revised Stanford- 
Binet, Form L, administered at time of 
their entrance into kindergarten or during 
the regular school year. For the underage 
students the range of IQ was from 103 to 
127, and for the overage students the 
range was from 101 to 126. 

After eleven years in school, the over- 
age students were significantly taller but 
not significantly heavier than the under- 
age students. They also had been signifi- 
cantly? more successful in maintaining 
regular progression from grade to grade 
than the underage students. 

During the elementary school years 
(kindergarten through Grade eight) the 


2Unless otherwise specified, each signifi- 
cant difference reported here was statistically 
significant at the .01 per cent level of con- 
fidence, and was obtained by the application 
of the ¢ test of statistical significance, where 


D 


t= TxD? — (8D)? 
NAW — 1) 


sIn this instance the chi-square test of 
statistical significance was applied, and in- 
dicated a level of confidence between two 


and five per cent. 
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TABLE 1 
Numser, IQ, AcE, HEIGHT, WEIGHT, AND 
GRADE OF OvERAGE AND UNDERAGE 
STUDENTS STUDIED 


Overage Underage 
Boys | Girls | Boys | Girls 

Number 42 31 42 31 
Mean IQ 111.17) 111.31| 111.24| 111.40 
Pray 24 10 

15-6 18 21 

16-3 17 21 

16-4 25 10 
Height 57") 5/4") 56"! 5'3” 
Weight 148 120 144 115 
Grade 

8 1 

9 3 5 6 

10 38 30 36 25 

11 1 1 


overage students were marked significantly 
higher than the underage students, but the 
differences between the overage and under- 
age students tended to decrease as higher 
grade levels were reached. The girls con- 
sistently were marked higher than the 
boys of their respective groups, but this 
difference showed no identifiable tendency 
either to increase or decrease. In the high 
school the marks received by the overage 
students were significantly higher than 
the marks made by the underage students. 


TABLE 2 
Mean RATINGS ON PERSONAL TRAITS FOR 
OVERAGE AND UNDERAGE STUDENTS 


Un- |Over- |\Differ- 
derage| age | ence 
1. Participation in | 3.61 | 4.38 | .77* 
Group Activity 
2. Attitude Toward | 3.84 | 4.30 | .46* 
School Regulations 
3. Appearance, 4.37 | 4.64 | .27* 
4. Dependability 3.66 | 4.27 | .61* 
5. Emotional Stability | 3.71 | 4.70 | .99* 
6. Initiative 3.22 | 4.01 | .79* 
7. Cooperativeness 3.64 | 4. -64* 


“Significant at the .01 % level of confidence 
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Achievement test scores reported at = 


various grade levels during the elementary 
school showed that the overage students 
achieved significantly higher scores in 
reading for Grades three, six, and eight; 
in arithmetic for Grades four, six, and 
eight and in social studies for Grade five. 
The difference between overage and under- 
age were not significant in spelling for 
Grade five; language for Grades five and 
eight; and science for Grade seven. 

Near the close of each school year 
every student is rated by his teacher on 
each of seven traits. Ratings recorded for 
each grade level from three through eight 
for each trait were summed and the mean 
taken for each pupil. Table 2 shows the 
mean group ratings for overage and under- 
age students according to a scale of one 
to five, with five being the highest rating. 
For all traits the Overage students were 
rated significantly higher than the under- 
age students. 

The girls were rated higher than the 
boys of their respective groups on all 
traits. For three traits, Attitude Toward 
School Regulations, Dependability, and 
Emotional Stability, the differences be- 
tween boys and girls were greater than the 
differences between underage and overage. 

The number of problems marked by 
overage and underage students on a prob- 
lem inventory was not significantly differ- 
ent. Although the overage and underage 
boys tended to mark the same problems, 
the underage boys marked more problems 
relating to home and family. The over- 
age girls marked more problems in the 
category dealing with school than in all 
the other categories combined. Although 
this category was also the most popular 
with the underage girls, they marked al- 
most as many problems in the section 
“After High School.” 

The results on the Guilford-Zimmerman 
Temperament Survey indicate that the 
overage boys tend to be less inclined to be 
Suspicious or to see personal reference in 


EE 
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the words or actions of others, and the 
overage girls tend to be somewhat more 
socially aggressive, with greater drive and 
energy. All of the mean scores for the girls 
and all but five of the 20 mean scores for 
the boys fall in the average range. 
Reported absences in elementary school 
show little difference between overage and 
underage students. From kindergarten 
through Grade eight the total median 
number of absences per year for both 
overage and underage boys was 6.50 days. 
For underage girls the total median was 
8.00 days, and for overage girls it was 
8.50 days. For both overage and underage 
the trend seemed to be for the greatest 
number of absences to occur in the pri- 
mary grades, decrease in number during 
the intermediate grades, and then increase 
again at about Grade eight, with the girls 
showing the higher rate of increase. 


CONCLUSIONS 


Asa group, the overage children made 
better school progress than did the under- 
age children. The overage children, from 
kindergarten through Grade ten, made 
Significantly higher marks in subjects, sig- 
hificantly higher scores on achievement 
tests in reading, arithmetic, and social 
Studies, were rated significantly higher on 
Personal traits by their teachers, and were 
Significantly more successful in maintain- 
ing regular progression from grade level 
to grade level. 

That the differences between boys and 
girls were greater than the differences be- 


tween overage and underage for three of 
the personal trait ratings may indicate a 
sex-aşsociated factor in these ratings. The 
overage and underage boys marked es- 
sentially the same problems on a youth 
inventory, but the two groups of girls 
did not show as much agreement on prob- 
lems as did the boys. 

Although there is some evidence that 
the differences between the overage and 
underage students tended to decrease with 
higher grade levels, perhaps this is what 
should be expected since the advantage in 
mental age that the overage group carries 
in the elementary school grades tends to 
decrease as the students get older. 

Before concluding that it would be bet- 
ter for underage children to wait until 
the next year to begin school, it should be 
noted that most of the underage children 
made average school progress. As a group, 
they made average marks in subjects, 
average scores on achievement tests, re- 
ceived average ratings by their teachers 
on personal traits, and did not mark sig- 
nificantly more problems on the problem 
inventory than did the overage students. 

However, it should be remembered that 
both the overage and the underage chil- 
dren studied here were selected on the 
basis of intelligence (average IQ of each 
group about 111). Thus, a better than 
average performance may legitimately be 
expected for either group on certain of 
the measures used. 


Received August 4, 1967. 
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A FURTHER NOTE ON BASAL METABOLISM 
AND ACADEMIC PERFORMANCE 


MARY E. YARBROUGH 
Meredith College 


anD HAROLD G. MCCURDY 
University of North Carolina 


It is sometimes necessary to conclude 
that an attractive hypothesis is incorrect. 
In 1947 one of the present authors re- 
ported in this Journal a substantial cor- 
relation between BMR and the academic 
performance of a sample of college women 
(2). When his correlation was combined 
with the lower, but still positive correla- 
tion reported in an earlier study by Pat- 
rick and Rowles (4), the resulting 7 of 24 
seemed large enough to justify the specu- 
lation that college success might owe some- 
thing to basal metabolic rate. It was 
stressed at the time, however, that the 
hypothesis called for a wider sampling of 
the college population. In the light of the 
further data which we can now offer, the 
Meredith College sample of 1947 appears 
as a statistical aberration and the sug- 
gested hypothesis untenable, since the true 
correlation between the variables at issue 
seems to be in the neighborhood of zero. 


New ann OLD Dara 


In Table 1 are presented the data on 
which we base our rejection of the old 
hypothesis. It speaks for itself, but a few 
words regarding the studies which it sum- 
marizes will not be out of place. 

The studies are listed in the table in 
roughly chronological order. It will be 
noticed that the information from Om- 
wake, Dexter, and Lewis (3) was avail- 
able in published form in 1934. It was un- 
fortunately overlooked by McCurdy in 
1947, but it would not have altered the 
Suggestion made at that time, These au- 
thors state regarding their 72 Agnes Scott 
students: “Those making a high scholastic 
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average tend to have high metabolism, 
but little relationship between poor schol- 
arship and metabolism is evident.” Their 
correlation of .138, when combined with 
the 05 of Patrick and Rowles and the 
53 of McCurdy, contributes to a correla- 
tion (by Fisher’s z transformation) of .19 
for 152 individuals, which is significant, at 
slightly better than the three per cent 
level. 

The three new sets of data, all inde- 
pendently gathered, are those of Schutte 
at California, McCurdy at North Caro- 
lina, and Yarbrough at Meredith. The 
Schutte dissertation (5) has not been 
published, and the published abstract 
came to our attention quite recently; but 
the author has kindly furnished us with 
the important data which appear in the 
table in a personal communication, 

Concerning the other two studies of re- 


cent date a few details can be given. ` 


McCurdy at the University of North 
Carolina at Chapel Hill, through the 
kindness of E. McG. Hedgpeth of the 
University Infirmary, secured a large col- 
lection of routine BMR Tecords, of which 
a certain number applied to students, 
Some 560 of these were checked against 


the academic files of the Central O; 
Records in order to find a 


graduate students as po 
taken courses falling wi 
liberal arts curriculum, 
cluded because their nam 
in the files, or because th 
students, or because the 
irregular. The final yiel 
Point-hour ratio was ase 


ssible who had 
thin the central 
Many were ex- 
es did not appear 
ey were graduate 
ir curricula were 
d was 117 cases, 
ertained for these 


aS many under- 
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TABLE 1 
SUMMARY or Stupres on BMR AND COLLEGE SCHOLARSHIP 
Study ? BMR Point-Hour Correlation 
Ratio 
Patrick & Rowles 52 F —7.3 1.50 .05 
(Ohio University) +8.97 +.51 
Omwake, Dexter, & Lewis 72 F -138 
(Agnes Scott) 
McCurdy 28 F -7:79 1.19 .53 
(Meredith) 7.30 +.55 
Schutte 556 F —13.08 —.06 
(University of California) +9.31 
37M —12.36 .104 
£10.62 
McCurdy 102 —8.50 1.13 —.08 
(UNC) (37 F, +10.35 +.89 
65 M) 
Yarbrough 33 F —8.30 1.32 —.05 
(Meredith) +8.00 +.72 


Note.—BMR and point-hour ratio determinations are not exactly comparable from study to study because of 
differences in the machines used and in the assignment of points to letter grades. 


cases in or near the academic quarter 
when the BMR was taken. These 117 
students ranged in age from 17 to 34; the 
distribution was skewed, and it was de- 
cided to cut off the older students at a 
break in the distribution, leaving a total 
of 102 with an age range of from 17 to 
24, symmetrically distributed around the 
mean age of 21. No significant differences 
appeared between the 65 men and 37 
women of this sample in age, point-hour 
ratio, BMR, or the relations between 
variables, and therefore the results are 
lumped together in the table. It was fully 
realized at the time that the manner of 
assembling these data did not afford a 
clean-cut, test of the hypothesis, since errors 
might have crept in at several points; 
but the Specific purpose was to discover 
whether the hypothesized relationship was 
strong enough to stand up in the face of 
the sort of random errors which might be 
encountered by a practical-minded college 
dean or physician working upon the same 
hypothesis. The physician who lent the 
BMR records, incidentally, felt that the 
hypothesis was perhaps a realistic one. 
Yarbrough at Meredith, on the other 


hand, attempted to follow strict criteria 
both in sampling the students and in 
taking the BMR readings. The students 
were all in their first year of college, 
taking virtually the same program, and 
being subjected to very similar require- 
ments by teachers who value good aca- 
demic work. Three BMR records were 
taken on each girl, and each record was 
carefully inspected for possible technical 
flaws. As in the earlier Meredith study, 
the record yielding the lowest BMR for a 
given individual was utilized in the corre- 
lational analysis. In all essential respects 
the experimental procedures and the na- 
ture of the sample seemed comparable to 
those of the earlier study. But the correla- 
tion between BMR and scholarship was 
definitely much lower. 

It is clear from inspection of Table 1 
that the correlation of .53 in 1947 has to 
be considered as quite exceptional. We can 
think of no explanation except accident of 
sampling. The total weight of the evidence 
supports the view that BMR’s in the nor- 
mal range have little or nothing to do with 
scholarship at the college level. 
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CONCLUSION AND COMMENTARY 


The available evidence contradicts the 
plausible hypothesis that basal metabolic 
rate might affect academic performance in 
college. The correlation between the physi- 
ological and the intellectual indices used 
appears to be at or near zero. Perhaps 
the persuasiveness of the hypothesis in the 
first placed depended entirely too much on 
a superficial view of the interrelationships 
of energy, motivation, and thought. 

In psychological circles the null hy- 
pothesis is often regarded with distaste, 
and “negative” results are dreaded so 
much that, it is rumored, some investi- 
gators avoid repeating experiments lest 
chance should be against them the second 
time. If the rumor is true, then some of 

our theories must have very spindly 
foundations. No matter what the signifi- 
cance level of a single result may be, it is 
not sufficient by itself to establish a gen- 
eral working rule. The present summary 
of metabolic studies should make it clear 


that one correlation in a set of six may be 
freakishly different from the rest. 
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A remedial reading program is usually 
a relatively expensive program since” it 
may entail individualized or small group 
instruction in addition to the school’s 
regular full-time program. Limitations of 
finances and time usually limit the amount 
of service available and it is desirable to 
utilize some sort of selective procedure in 
choosing cases for remedial instruction. 

This article describes the development 
and evaluation of a test for selecting the 
remedial readers most likely to profit 
from special instruction. For the purposes 
of this discussion, a “remedial reader” is 
defined as any child whose reading 
achievement is significantly retarded be- 
low both his grade placement and his 
capacity or potential for reading achieve- 
ment. In addition, the child in mind is 
specifically lacking in reading skills usu- 
ally developed by the third or fourth 
grade. 

The test is designed to duplicate as 
nearly as possible the “Jearning-to-read” 
process for remedial readers. It was as- 
sumed that the primary learning task 
of the remedial reader is to learn associa- 
tions between written symbols and fa- 
miliar spoken language. By means of the 
experimental test an attempt has been 
made to provide the remedial reader with 
a controlled learning situation uniquely 
different from reading, yet so similar to 
the process of learning-to-read that per- 
formance on the test might be indicative 
of ability to profit from remedial reading 
instruction. 


Description OF TEST 


The experimental test is basically com- 
posed of five short stories written entirely 
with pictures and symbols instead of 


words. Each of these “test stories” is on 
a successively more difficult level and is 
preceded by a practice page which in- 
troduces the new symbols used at that 
level and provides practice with the sym- 
bols in context. The manner of presenting 
the material on these introductory pages 
is similar to that of a reading lesson. 
Figure 1a shows the first group of symbols 
and the following line of practice material 
for Level I. A portion of the Level V 
test story is shown in Figure 1b. 

The total vocabulary of the experi- 
mental test consists of 72 symbols. Two 
of these symbols represent the inflections, 
“ing” and “-s”. The “-s” inflection is 
used both as a plural and to denote pos- 
session. 

The vocabulary of the first level con- 
sists of 14 symbols. These symbols repre- 
sent such words as “boy,” “the,” “cat,” 
“is,” “black,” and so forth. Many of the 
symbols at the first level consist of stick 
drawings or simple pictures of the object 
the symbol is intended to represent. Each 
successive level adds approximately 15 new 
symbols to the vocabulary being used in 
the test stories. 

The symbols at the higher levels do 
not provide picture clues. There are some 
phonetic elements common to a number 
of the symbols. However, this is not 
pointed out to the Ss. 

The materials for administering the test 
consist of an answer sheet and a test 
becklet. The test booklet contains ten 
pages, two pages for each of the five 
levels, as described above. The answer 
sheet is used by the examiner in recording 
the errors made by the S while reading 
the test stories. 

The score on the test is the total 
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a little girl’s birthday so they took presents for her.) 


number of errors made on the five test 
stories read orally by the S. Hesitations 
of more than approximately five seconds 
(for which the word is supplied by the 
examiner), substitutions, omissions, and 
additions were considered errors. Repeti- 
tions were not so counted in this study. 


PROCEDURE OF THE STUDY 


1. The first step in the development of 
the instrument consisted of small-scale 
tryouts and revisions with a series of 
crude tests and procedures. From this 
work evolved the Experimental Prog- 
nostic Test for Remedial Readers de- 
scribed above. 

2. The experimental test was admin- 
istered to a complete fourth grade class 


TABLE 1 

CORRELATION oF EXPERIMENTAL Tust 
Error SCORES WITH Gates 
READING Tests 

(N = 24) 
Vocabulary —.51 
Comprehension — .65 
Speed — .58 
Oral reading — .48 
Composite score — .59 


of 24 pupils at Francis Willard School 
in Eugene, Oregon during March of 1955. 
The resulting data were analyzed for re- 
liability and concurrent validity informa- 
tion. 

3. Next the test was administered to 
26 remedial readers in the Corvallis Public 
Schools for the purpose of evaluating the 
predictive validity. Each of these 26 re- 
medial readers was given the experimental 
test and a series of Gates reading tests. 
Following a four-month period of daily 
individual and small-group instruction 
these same children were given an alter- 
nate form of the reading tests. These data 
were analyzed for predictive validity in- 
formation. 


FINDINGS 


Concurrent validity. Twenty-four chil- 
dren in a regular fourth-grade classroom 
were administered the Gates Reading Sur- 
vey, the Gates Oral Reading Test, and 
the Experimental Prognostic Test. The 
experimental test was found to correlate 
—.59 with the composite measure of read- 
ing achievement. (Negative correlations 
are based on error scores.) Table 1 pi 
sents the correlations of the experimental 


sA 


PROGNOSTIC TEST FOR REMEDIAL READERS 25 


_ test with the four reading achievement 
subtests. These correlations are all signi- 
ficant at the .05 level of confidence. 

A mean score of 58.0 errors was made 
on the experimental test by this fourth- 
grade class. The standard deviation of 
scores was found to be 22.8 errors. 

Reliability. The experimental test re- 
liability, as computed by the Kuder- 
Richardson Formula 21, is .91. This value 
probably represents an underestimate 
since the data do not meet the assumption 
of equal difficulty for all items in the 
test. The split-half correlation was .92 
and a reliability of .96 is obtained when 
this value is corrected by the Spearman- 
Brown Formula. These values are prob- 
ably overestimated to a small degree. The 
standard error of measurement, based on 
the corrected split-half reliability, is 4.6 
errors. Since the standard deviation of 
scores, in this sample, is 22.8 errors it 
would seem to indicate that the test may 
have good discriminative properties. 

Predictive validity. Estimates of the 
predictive value of the experimental test 
were obtained from a group of cases Te- 
ferred and selected for remedial reading 
instruction. Since the total number of 
cases was relatively small and spread 
among several age levels, it was not pos- 
sible to apply correlational procedures to 
the data which had been collected. An 
alternative method was developed which 
consisted of pairing the remedial cases 
and predicting, on the basis of the ex- 
perimental test scores, which member of 
each pair would show the most gain by 
the end of the training period. In order 
to pair the cases they were grouped ac- 
cording to age levels and each child was 
Paired with each other child at the same 
age level if there was a significant dif- 
ference between their scores OR the 
Experimental Prognostic Test. Several 
different values were established as “sig- 
nificant differences” in order to observe 
the amount of variance in the predictive 


ability of the test. These significant dif- 
ferences in scores were arbitrarily estab- 
lished as two, three, five, eight, and ten 
times the standard error of measurement. 
On this basis it was usually possible to 
pair each child with a few other children 
at the same age level. The member of 
each pair showing the better performance 
on the prognostic test was predicted as 
the one most likely to gain from the 
special instruction. 

This procedure experimentally dupli- 
cates a very real and practical adminis- 
trative problem in the area of remedial 
reading. The person responsible for se- 
lecting remedial reading cases is faced 
constantly with the question, “Which of 
these two children can profit the most 
from remedial instruction?” In essence, 
this study has experimentally created a 
series of such possible choice situations. 
The statistical purpose of this portion of 
the study was to determine whether the 
proposed instrument would predict, any 
better than chance, which children would 
profit the most from instruction. 

Table 2 shows the results of this por- 
tion of the study. Using two times the 
standard error (2 X Se) as a significant 
difference between prognostic test scores, 
it was possible to set up 44 pairs of 
remedial cases. Of these 44 predictions, 
34 were correct—indicating a predictive 
efficiency of .54 (54% better than chance). 


TABLE 2 
PREDICTIVE EFFICIENCIES or EXPERI- 
MENTAL PROGNOSTIC TEST WITH CASES 
GROUPED BY CHRONOLOGICAL AGE 


(N = 26) 

Differen-| paj; Correct |Percent- "P 
tial (Se i predic- | age cor- Erien 
units) (n, tions rect |uiclency 

2X 44 34 77 .54 
83x 39 31 80 -60 
5X 25 2014 82 64 
8X 17 15 88 .76 
wx | 12] n 92 ‘86 
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When higher values are used as a signifi- 
cant difference between scores, the num- 
ber of pairs becomes less and the accuracy 
of prediction steadily increases. Using a 
value of ten times the standard error 
provides 12 pairs and a predictive effi- 
ciency of .86. The steady increase in pre- 
dictive efficiency, as higher values are 
used for discrimination, is further evidence 
that the test is functioning predictively 
in this situation. 

When these same cases are paired on 
the basis of grade level rather than age 
level it is possible to obtain more pairings, 
but there is a decrease in the predictive 
efficiency of the instrument. This is prob- 
ably due to the more heterogenous nature 
of the sample in terms of age. (At one 
time the experimental test was adminis- 
tered to a number of children at various 
ages, and it was found that the number 
of errors rapidly decreases with an in- 
crease in the age of the Ss.) If a value 
of five times the standard error is used 
as a significant difference between scores 
it is possible to set up 35 pairs. This 
yields a predictive efficiency of .60 (as 
compared to .64 for 25 pairs when 
grouped by age). A predictive efficiency 
of .60 was the highest obtained when 
pairings were made on the basis of group- 
ing by grade (as compared to .86 when 
grouped by age). 

Twenty-three of the cases used in this 
portion of the study had had Stanford- 
Binet Intelligence Tests administered at 
some time. Predictive efficiencies were 
computed for these data using the same 
procedure used to predict with the ex- 
perimental test. No predictive value for 
the Stanford-Binet was found in this 
sample. However, since this portion of 
the study was subjected to limited con- 
trol the results do need to be interpreted 
cautiously. One significant point is 
brought out by this comparison. It was 
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quite difficult to obtain as large a number 
of pairings from the intelligence test data 
as from the experimental test data. This 
was due to the relatively narrow range 
of performance available for discrimina- 
tion between Ss on the intelligence test 
results as compared to the experimental 
test results. The range of I.Q.’s was ap- 
proximately 30 points while the range of 
performance on the experimental test was 
approximately 100 points for the same 
Ss. In both instances, quite coincidentally, 
the size of the standard error of measure- 
ment is nearly identical. 


CONCLUSIONS 


Within the limitations of this study it 
would appear that the Experimental 
Prognostic Test for Remedial Readers has 
demonstrated a predictive value in select- 
ing cases for remedial reading instruction. 
Further research will be required in order 
definitely to determine the degree of this 
value. At present, it would appear that 
the test shows a greater possibility for 
predicting success based on gains during 
a period of instruction than the techniques 
in common use. However, it should be 
noted that the research concerning the 
predictive value of the techniques in 
current use is incomplete, 

One is impressed by the paucity of 
research studies designed to consider some 
aspect of the prediction of success in 
remedial reading. Most studies which are 
referred to in this area have been origi- 
nally designed for some other purpose. For 
example, the writing and research which 
provides the basis for the use of intel- 
ligence tests as a predictor of success in 
remedial reading is primarily based on 
studies of concurrent validity obtained 
between intelligence and reading achieve- 
ment with groups of normals. These cor- 
relations have not been based on the 
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predictive validity between the intelligence 
test scores of retarded readers and their 
subsequent gains in remedial reading. 
This does not mean to say that the studies 
were not appropriate for the purposes 
originally intended by the authors, but 


the application of the results of such 
studies to the problem of selecting cases 
or predicting success in remedial reading 
will remain highly questionable until sub- 
jected to experimental verification. 


Received August 30, 1957. 
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EFFECT OF AN AUDIO-VISUAL PHONICS AID 
IN THE INTERMEDIATE GRADES 


CAROLYN LUSER, EILEEN STANTON, anp CHARLES I. DOYLE 
Loyola University, Chicago 


To test the effect of formal phonics 
drill on a group basis, the present ex- 
periment employed audio-visual aids con- 
sisting of uniform recordings and indi- 
vidual charts for each pupil in four ex- 
perimental rooms.’ 

The population was chosen from a lower 
socioeconomic area, with many semitran- 
sients. The sample contained many Ne- 
groes, Puerto Ricans, Mexicans, and a 
scattering of other nationalities. The area 
and population were selected in the hope 
of finding more than the average number 
of handicapped readers. This hope was 
realized when the pretests were scored. 
The average reading scores in all eight 
rooms were below the grade-level expect- 
ancy, and there were many seriously re- 
tarded readers. The number of drop-outs 
and absences confirmed the transient na- 
ture of the population. While nearly 300 
pupils were available for pretests, only 
214 completed the entire battery of tests 
at the conclusion of the experiment. 


*Permission for the use of these four 
schools was graciously given by Peter B. 
Ritzma, Chairman, Special Projects Com- 
mittee, and Don C. Rogers, Assistant Super- 
intendent, Chicago Public Schools, and by 
Rev. David C. Fullmer, Assistant Superin- 
tendent, Chicago Catholic School Board. 
The District Superintendent, Douglas Van 
Bramer, and the principals and teachers in 
the four schools also rendered much coopera- 
tion. 

The authors are indebted to the World 
Book Company for permission to reprint 
Stanford Primary Reading Test, Form D. It 
appeared to the authors that fill-in responses 
would be more valuable for detailed analysis 
than the multiple-choice style adopted in 
the 1953 forms. 

The phonographs, phonics records, and 
pupil charts for the experimental rooms were 
donated by Bremner-Davis Phonics. 


28 


Two public and two parish schools were 
available in the area selected. The grade 
levels chosen for the experiment were the 
third and fourth grades, where reading 
difficulties tend to become more evident, 
and where pupils’ competence to follow 
group test instructions may be assumed 
more safely than at lower levels. 

In each school an experimental room 
and a control room were selected. The 
criterion for the selection of the experi- 
mental room was the lower average IQ 
derived from the Kuhlmann-Anderson 
Test, Form D (sixth edition). This form 
not only fitted the average grade expect- 
ancy of the experimental and the control 
rooms, but had the further advantage that 
the content was about equally divided be- 
tween verbal and nonverbal subtests. 

Besides the intelligence tests, four 
achievement measures were secured for 
each pupil; Gray’s Oral Reading Para- 
graphs, Stanford Primary Reading, Form 
D (paragraph meaning and word mean- 
ing) and the Marion Monroe form (writ- 
ten) of the Ayres Spelling Scale (3). 

After the pretests, each of the four ex- 
perimental rooms received 43 twenty- 
minute sessions of phonics drill with the 
phonograph records and individual pupil 
charts (1). These sessions were spaced 
three times a week for a period of 15 
weeks. The experiment was limited to 15 
weeks because the pretests had to wait 
for mid-year promotions, and the retests 
had to be completed before final examina- 
tions in June. No special motivation was 
given during the drill periods, aside from 
the encouragement offered in the records 
themselves. All the phonics sesions were 
conducted by one of the workers who had 
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administered the pretests, who was thus a 
familiar figure to the children. 

At the conclusion of the experiment, the 
entire population, experimental and con- 
trol, was retested with the complete bat- 
tery described above. Retests were con- 
ducted by the same examiners who 
administered the original tests. 


RESULTS 


The data from both batteries were 
analyzed by two methods: by compu- 
tation of the standard error of the gains 
(2), and by chi square. The chi-square 
analysis was based on the assumption that 
a gain in individual scores in the experi- 
mental group substantially greater than 
the average gain evidenced by the control 
group would be valid to determine & 
cutting point. Accordingly, for the achieve- 
ment tests, a gain of .5 grade score in 3.5 
months was set as significant. For the 
K-A 1Q, allowing for practice effect much 
greater than that shown by the control 
group, a gain of three IQ points was 
chosen. 
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While the chi-square test was a less 
refined and precise statistic than that 
based ‘on standard error, it did in general 
corroborate the findings of the latter. The 
results of both methods are summarized in 
Table 1. 

The gains in three of the four achieve- 
ment scores seem to indicate clearly the 
effectiveness of the standardized audio- 
visual drill. Failure to show a similar 
gain in word meaning is understandable 
in view of the background of the children 
in this sample. A marked gain in word 
meaning would seem to depend on en- 
riched experience and resultant growth in 
vocabulary more than on mastery of 
phonics. 

It seems evident that the unexpected 
gain in IQ points does not indicate a real 
change in intelligence. It does serve to 
point up the fact that performance on a 
group test of intelligence is markedly de- 
pendent on the pupils’ familiarity with the 
printed word. It is suggested, however, 
that improved habits of attention from 
group drill sessions may have contributed 


TABLE 1 


Test Scorps OF EXPERIMENTAL 


AND CONTROL GROUPS BEFORE 


AND AFTER AUDIO-VISUAL Puonics DRILL 


Experimental Control 
Net Diff. Chi 

Tests Pre Post| Pre | Post S gain t P square P 
N = 105 = 109 

Oral oo lsa eaj fo aa | 2j 00 y Re ee 
Reading |o 1.15 |1.40| 1.13| 1-29 

Parag. M 2.84 | 3.61] 2-90| 3.30) -37 | 4-40 >.001 | 8.87 ol 
Mean. s gt} 1.16, 92| -97 

Word M 2.79 | 3.26| 2.76} 3-16) -07 1.11 .30* 1.22 .30* 
Mean. s 78| .93) -83| -89 

Spell. M 2.79 | 3.26] 2.76] 3-00) -23 | 2.35 o2 | 4.26 | .05 
«97 | 1.17] -84} -2% 

Q aaa (ews gap lea] BaD! | Ar | m | 22 Sms 
z 11.2 (11.9 | 11.0 | 13.0 


* Not significant 


30 


in part to more successful performance 
on the K-A retests of the experimental 
group. z 


SUMMARY 


A sample of 214 third and fourth 
graders from four schools in an under- 
privileged area was divided into two 
groups. A control group was measured 
against an experimental group which re- 
ceived 43 drill sessions in phonics, with 
uniform recordings and individual charts. 
When both groups were retested after 
the experiment, the experimental group 
showed gains on standard tests of oral 
reading, paragraph meaning, and spelling, 
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which were significantly greater than the 
gains of the control group. Gain in word 
meaning apart from context was not sig- 
nificant. Retest performance on a paper- 
and-pencil intelligence test also showed a 
marked gain for the experimental group. 
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PREDICTING SCHOOL ACHIEVEMENT FOR BILINGUAL PUPILS! 


JAMES G. COOPER 


Territorial College, Agana, Guam, Marianas Islands 


Teachers and other school personnel of 
the Territory of Guam have long been 
aware of the need for adequate predictors 
of school achievement for their bilingual 
pupils. In December of 1956, Guam’s De- 
partment of Education authorized a study 
to discover to what extent, if any, cur- 
rently available tests would provide such 
predictions. Guam is a territory whose 
cultural patterns are rapidly changing. 
F rom a Spanish type of church-dominated 
society, from a military paternalism, 
Guam is becoming infused with current 
American ideas, trends, and practices. 
The local language, Chamorros, is slowly 
giving way to English. However, English 
is spoken only in the school classroom, 
infrequently on the playground, and 
rarely in the home and community. Con- 
sequently, because of the unique cultural 
and language factors, currently available 
measures of intelligence must lie in the 
realm of questionable validity until dem- 
onstrated otherwise. 

This study endeavored to determine the 
Predictive ability of six tests of intelli- 
gence for certain fifth-grade pupils of 
Guam. Only those tests which were wholly 
or partially performance Or nonverbal 
Were considered. In order to hold cultural 
factors constant, four relatively isolated 
communities were selected. The villages 
of Inarajan, Merizo, Talofofo, and Umatac 
have had no electricity (and consequently 
no television) until quite recently; no 
Statesiders (persons whose usual abode 
is mainland, U.S.A.) as residents; nO tele- 
Phones; few movies. Books and magazines 
are not commonly found. These four vil- 
lages enrolled a total of approximately 


M 1A research supported by a grant from 
he James McKeen Cattell Fund. 
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180 fifth-grade children distributed among 
six classes. 

The plan of study entailed the follow- 
ing: 

1. Administer three group tests of in- 
telligence to all fifth-grade pupils of the 
four villages. The tests were the California 
Tests of Mental Maturity, 1950 S-Form, 
Elementary; Davis-Eells Games, Inter- 
mediate Level; and the Culture Free In- 
telligence Test, Scale 2, Form A. 

2, Select a stratified, random sample 
of 51 pupils from the larger group. Give 
each of these pupils the following indi- 
vidual tests: Leiter International Perform- 
Wechsler Intelligence Scale 
Columbia Mental 


ance Scale, 
for Children, and the 
Maturity Seale. 

3. Give all 
Achievement Tests, 
tary level. 

4. Obtain teacher ratings for each child 
regarding his school success. 

5. Relate intelligence tests scores to 
achievement test scores and to teachers’ 
ratings (applicable to group tests only). 

6. Study the interrelationships between 


certain tests. 


pupils the California 
Form AA, Elemen- 


PROCEDURE 


The plan indicated above was followed: 
The Davis-Eells Games and California 
Tests of Mental Maturity were adminis- 
tered between November 1956 and Feb- 
ruary 1957. The Culture Free Intelligence 
Test was given during March of 1957 
and the California Achievement Tests 
given in May 1957. The teachers’ ratings 
were obtained prior to achievement test- 
ing. Individual tests were given from 
February through June 1957 in this se- 


quence: Wechsler Intelligence Scale for 
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Children, Columbia Mental Maturity 
Scale, and Leiter International Perform- 
ance Scale. A 


The sample for individual testing was 
drawn via a table of random numbers. 
Each of the six classrooms was permitted 
to contribute its proper share of boys 
and girls. This was necessary because the 
schools of Inarajan and Merizo divided 
their fifth-grade pupils into fast and slow 
groups. Also, these schools enrolled twice 
as many fifth graders as Umatac and 
Talofofo. 


Group Tests 


The principal findings from the three 
group tests of mental ability are shown 
in Table 1. The table shows that 164 
pupils obtained a mean California Test 
of Mental Maturity total IQ of 83.494 
with a standard deviation of 11.087; the 
Pearson correlation coefficient between 
these scores and the California Achieve- 
ment Tests was .644 and the stability of 
this coefficient is indicated by the .99 
confidence limits of .509-.747, These limits 
were computed via the z transformation 
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(6, p. 147). Similar data are recorded 
for California Language IQ, Nonlanguage 
IQ, Davis-Eells Games, and Culture Free 
Intelligence Test. The latter test is re- 
ported in raw scores because the IQs 
yielded a distribution severely skewed, 
ie., too many low scores. The use of 
Taw scores gave a normal (by eye) dis- 
tribution. 

A positive skew was noticed in the 
distribution of California Nonlanguage 
IQs. However, a chi-square test indicated 
that the obtained frequency distribution 
fitted the normal form adequately (3 
pp. 284-285). 

The Davis-Eells Games scores were 
considerably lower than the other two 
tests. The possibility existed that this 
may have been caused by the pupils 
difficulties in understanding the direc- 
tions. (This test consists of a number of 
Pictures about which the examiner makes 
various statements. The subject responds 
by indicating which statement best ap- 
plies to each picture.) This possibility 
was explored by retesting a fifth-grade 
class in Merizo. The second test was given 


> 
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TABLE 1 


CORRELATION Corrricients BE 


TWEEN GROUP TESTS OF INTEL 
AND CALIFORNIA ACHIEVEMENT TEST Raw SCORES 


LIGENCE 


.99 
Group Test M o r limite Gin N 

California Test of Mental 

Maturity 

Total IQ 83.494 11.087 644 .509-,747 

Language IQ 81.024 10.194 -584 -434—.703 in 

Nonlanguage IQ 88.067 17.239 522 .359-.655 164 
Davis-Eells Games (IPSA) 66.970 11.146 -531 -369-.601 164 
Culture Free Intelligence 20.685 6.355 549 -891-.676 

Test (Raw scores) (this ue 

yields a mean IQ between 

75 and 78) 
California Achievement 

Test 

Total raw score 154.451 36.178 164 
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wholly in Chamorros. The average in- 
crease of three points lacked statistical 
significance. It was concluded that this 
test measures as well in English as it 
does in Chamorros. 

The matter of sex differences was an- 
alyzed for each of the group tests. None 
of the small differences approached the 
5% level of significance. The mean CA 
of the boys was 12-0 and of the girls 
11-8; the difference of four months was 
significant at 1%. 

Discussion of group tests. The data of 
Table 1 indicate that the California total 
IQ predicted California Achievement Test 
scores fairly well; the schools of Guam 
should consider seriously more widespread 
application of this test. It was interesting 
to note that neither the Language nor 
Nonlanguage IQ’s should be used sep- 
arately, 

Neither the Davis-Eells Games nor the 
Culture Free Intelligence Test offer as 
much promise, However, from the point 
of view of test theory, the fact that these 
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two tests do show pronounced positive 
correlations with scores on a typical school 
achievement test is highly significant. In 
other words, the abilities sampled by 
these two measures are those which in 
part determine school success in a bi- 
lingual setting. 

Group tests and teachers’ ratings. The 
six teachers of the children tested were 
asked to rate their pupils according to 
the directions: “... Divide your pupils 
into three groups. Place the names of 
your best pupils into the top third, the 
poorest into the low third, and the others 
into the middle third.” The relationship 
between this sort and the four group 
tests was established by locating the 
median for each class and preparing a 
3 x 2 table from which chi square could 
be computed. A high value of chi square 
indicates that the teachers’ judgments and 
the pupils’ test positions (above or below 
the medians for their classes) were con- 
gruent. These data appear in Table 2. 


TABLE 2 


Cni-Squarn VALUES RESULT 
witn TESTS OF 


ING FROM COMPARING TEA 
ABILITY AND ACHIEVEMENT 


CHERS’ RATINGS 


(df = 2) 2 
: Name of Test 
Calif. Mental Maturity Calif. 
School N Davis | Culture | Achievement 
Lan- Non- | Eells Free 
Total guage | language 
Tharajan + 
Te: .59** | 2.82 | 5.93 19.08 
‘Teacher A 29 9.48** Lg a a ae a jee 
Teacher B 28 7.42 a : 
Merizo = 
i 1.81 1.60 4.71 6.19 
Teacher A 23 | 1.30 3 al 8.06* | 2.44 16.02** 
Teacher B 32 1.28 j . 
T, 1.07 12.77** 
Talofofo 34 | 4.49 |7.63* 3.52 | 4.77 
.95** 17.91** 
Umatae 26 7.26% | 3.22 9.96** | 1.96 | 10 
eee 


* Significant at 5% level. 
** Significant at 1% level. 
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The table shows that the California 
Test of Mental Maturity IQ’s agreed 
fairly well with the ratings of three 
teachers and that the Nonlanguage IQ 
was as effective as the Total IQ. This 
finding was surprising as teachers may be 
expected to give more weight to their 
pupils’ verbal behavior than to their non- 
verbal behavior. Neither the Culture Free 
Intelligence Test nor the Davis-Eells 
Games corresponded well with teachers’ 
ratings. The California Achievement Test, 
however, showed a pronounced agreement 
with teachers’ ratings. These latter data 
demonstate a high degree of validity for 
this test, i.e., the California Achievement 
Test seems to measure the kinds of 
achievements that these teachers deem 
important when they differentiate be- 
tween successful and less successful pupils. 
These data also suggest that this test 
may possess considerable curricular va- 
lidity. This point, however, should be 
verified by study of both teachers’ opin- 
ions and actual curricular materials used 
in these classes. 


Individual Tests 


The results obtained from administer- 
ing the three individual tests of intel- 
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ligence and correlating these scores with | 


the total raw scores from the California 
Achievement Tests appear in Table 3. 
The table shows that the best prediction 
was given by the Verbal Scale of the 
Wechsler Intelligence Seale for Children. 
The greatest amount of scatter was given 
by the Columbia Mental Maturity Scale 
as shown by its sigma of 24.37 ; this test 
also produced the highest mean IQ. 

The means of these individual tests 
were well below those given in the re- 
spective manuals. The obtained sigmas 
were three to four IQ points lower than 
those reported for the Wechsler Intel- 
ligence Seale for Children (8), equal to 
the Columbia Mental Maturity Seale’s 
listed 25 IQ points (1), and lower than 
the figures given for the Leiter Interna- 
tional Performance Scale (4). The latter 
was difficult to determine, because the 
Leiter manual lacks specificity on this 
point. These data indicate that the IQ 
scores for the Guam sample were dis- 
tributed in a manner similar to the 
standardization groups. 

Discussion of individual tests. The data 
revealed in Table 3 show that the Verbal 
Seale of the Wechsler Intelligence Scale 
for Children gives a fairly accurate pre- 


TABLE 3 


PEARSON CORRELATION COEFF 
Tests, (Raw SCORES 


ICIENTS BETWEEN CALIFORNIA ACHIEVEMENT 
) AND IQs FROM InptyrpuaL TEsts 


(N = 51) 
-99 confidence 
Test M o r limits (via z) 
Wechsler Intelligence Scale for Children 
Full Scale 72.89 11.84 77 .58-.88 
Verbal Scale 71.58 11.30 80 -63-.90 
Performance Seale 77.15 12.70 54 .24-.75 
Columbia Mental Maturity Scale 83.86 24.37 61 .34-.79 
Leiter International Performance Scale 72.78 12.58 -66 .40-.82 
California Achievement Test (raw score, 
sum of all tests) 155.39 43.16 
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diction of school achievement in Guam’s 
bilingual setting. It was interesting to note 
that the Wechsler Full Scale IQ was not 
quite so efficient a predictor as was the 
Verbal Scale IQ. This finding was not 
supported by comparable evidence con- 
cerning the California Test of Mental 
Maturity reported in Table 1; in the 
latter case, the Language IQ and the 
Nonlanguage IQ predicted about equally 
well. Wechsler states that of the Verbal 
Scale subtests, “Information” and “Com- 
prehension” are poor for those with in- 
adequate verbal facility and that “Arith- 
metic” is influenced by education (7, pp. 
80-82). He also notes that “Vocabulary” 
is affected by schooling (7, pp. 98-99). 
Therefore, it seems reasonable to believe 
that in a bilingual setting, successful 
school achievers will obtain higher scores 
on both the Wechsler Verbal Scale and 
on the California Achievement Test, i.e., 
we may be dealing with common elements 
rather than with intelligence per se. 

The data relating to the Leiter Inter- 


national Performance Scale are of con- 
siderable concern. This test is expensive 
(abowt $200), time-consuming to admin- 
ister (from 45 to 90 minutes), and very 
bulky. Its correlation with the California 
Achievement Test, although high (r 
equaled .66), was only slightly higher 
than the relatively inexpensive Columbia 
Mental Maturity Scale. Further, the Co- 
lumbia may be administered in from 10 
to 15 minutes, a factor of real importance 
to many psychometrists. On the positive 
side, however, both the Leiter and the 
Columbia have demonstrated consider- 
able validity for use with these bilingual 
pupils. These data are all the more sig- 
nificant when it is recalled that the Leiter 
is a completely nonverbal test and the 
Columbia is almost so (five cards on the 
Columbia require the ability to read 
words or letters, and five involve nu- 
merals). 

Interrelationships between tests. The 
Leiter International Performance Scale 
and the Columbia Mental Maturity Scale 
are relatively new arrivals to the scene 


TABLE 4 


INTERCOR! 


RELATIONS BETWEEN TESTS 
(N = 51) 


Columbia Mental 


Leiter International i 
Maturity Scale 


Performance Scale 


California Test of Mental Maturity 
Total IQ 
Language IQ 
Nonlanguage IQ 


Wechsler Intelligence Scale for Children 
Full Scale IQ 
Verbal Scale IQ 
Performance Scale IQ 

Davis-Eells I.P.S.A.* 


Culture Free Intelligence Test, Raw scores 
(N = 50) 


Columbia Mental Maturity Scale IQ 


* Index of Problem Solving Ability- 


-68 .62 
.62 54 
-66 -60 
.83 74 
73 -66 
-78 -68 
72 -69 
-75 -60 
-69 
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of mental measurements. In order to 

clarify the nature of their functioning, 

their scores were correlated witht each 

other and with the other tests of mental 

ability utilized in this study. The results 

are shown in Table 4. The table indicates 

that the Leiter measures quite consis- 

tantly with the other tests, since the cor- 
relation coefficients ranged from .62 
(California Test of Mental Maturity, 
Language IQ) to .83 (Wechsler Full Scale 
IQ). The Columbia followed a pattern 
similar to the Leiter, but the coefficients 
were somewhat lower. 


Summary AND CONCLUSIONS 


This study was undertaken to ascer- 
tain to what degree, if any, currently 
available measures of intelligence predict 
school achievement for the bilingual pu- 
pils in the Territory of Guam. Three 
group tests, the California Test of Mental 
Maturity, 1950 S-Form, Elementary; the 
Davis-Eells Games, Intermediate Level; 
and the Culture Free Intelligence Test, 
Scale 2, Form A were given to 164 pupils 
in grade five. Three individual tests of 
intelligence: the Leiter International Per- 
formance Scale, the Wechsler Intelli- 
gence Scale for Children, and the Co- 
lumbia Mental Maturity Scale were given 
to a stratified, random sample of 51 pu- 
pils. School achievement was defined pri- 
marily by scores received on the Cali- 
fornia Achievement Tests, Form AA, 
Elementary Level, and secondarily by 
teachers’ ratings. 

All the intelligence tests correlated posi- 
tively with the California Achievement 
Tests. The correlation coefficients ranged 
from .53 to .77 as follows: Davis-Eells 


COOPER 


Games, .53; Culture Free Intelligence 
Test, .55; Columbia Mental Maturity 
Scale, 61; California Tests of Mental 
Maturity, .64; Leiter International Per- 
formance Scale, 66; and the Wechsler 
Intelligence Scale for Children, Full Scale, 
atte 

Although teachers’ ratings corresponded 
well with rank on the achievement test, 
they were not closely related to scores on 
the group intelligence tests, 

This study demonstrated that the six 
intelligence tests examined predicted 
school success with a degree of accuracy 
ranging from moderate to high for 
Guam’s bilingual pupils. 
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While most psychologists who teach in 
our colleges and universities are very 
much interested in changing their stu- 
dents’ attitudes, few of them are report- 
ing studies of the problem. In their review 
of research in the teaching of psychology, 
Birney and McKeachie (1) show the 
paucity of this kind of investigation. Fur- 
thermore, most of the attitude studies 
which they describe deal with the intro- 
ductory course. Little is being done to 
discover the nature of attitudes related to 
advanced courses in psychology, either on 
the graduate or undergraduate level. The 
study presented in this paper is intended 
to help close that gap. 

A second-level course which is becoming 
increasingly popular with undergraduates 
is child psychology. The students who 
take this course vary considerably in their 
educational and vocational goals, but al- 
most all of them share at least one com- 
mon objective: they hope to learn some- 
thing about children which will help them 
as future parents. They are especially in- 
terested in learning how their own atti- 
tudes might affect their children’s develop- 
ment. It is important, therefore, that 
those who teach this course find out what 
kinds of attitudes toward parent-child 
relationships their students have, and 
how these attitudes change when they 
study child psychology. As a step in this 
direction, a recent pilot study by Hurley 
and Laffey (2), involving 19 students, 
concluded that a 10 week’s child PSY- 
chology course at Michigan State Univer- 
sity succeeded in making these students 


1A briefer version of this paper was read 
on September 4, 1957, at the convention of 
the American Psychological Association in 


New York City. 
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less rejecting in their attitudes toward 
children. There was no change in over- 
protecting attitudes. 

The present paper reports a more ex- 
tensive investigation of the same kind of 
problem explored by Hurley and Laffey. 
It attempts to answer this question: To 
what extent can an undergraduate, one- 
semester course in child psychology change 
students’ attitudes toward parent-child 
relationships? 


MeTHOD 


The subjects of this study consisted of 
four different classes totaling 157 stu- 
dents. None of them had previously taken 
a course in either child psychology, child 
development, or family relations. They 
represented a variety of majors within 
the College of Liberal Arts and Sciences 
at the University of Illinois, and were also 
drawn from other colleges within the 
university. 

The content of the course was the same 
for each class, with emphasis on problems 
of parent-child interaction. All classes 
were taught by the same instructor. In 
addition to the use of a basic text and 
lectures as sources of information and at- 
titudes, films were shown and discussed. 
Supplementary readings also helped to 
extend the variety of course content. 

A parent-child attitude scale was ad- 
ministered to each class at the beginning 
and end of the course. The scale was also 
given at the beginning and end of the se- 
mester to 155 undergraduates at the same 
university who were enrolled in a one se- 
mester introductory course in sociology. 
This was done to see whether or not 
changes which might occur in the psychol- 
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ogy students could be simply the result 
of general college living rather than tak- 
ing child psychology. Like the psychology 
classes, the sociology students represented 
a variety of majors within the College of 
Liberal Arts as well as other colleges. 
They were approximately the same as the 
psychology students in age and college 
class. None had ever taken a course in 
child psychology, child development, or 
family relations, nor was any taking one 
at the times the scale was administered. 

The attitude scale was a slightly modi- 
fied version of one constructed and vali- 
dated by E. J. Shoben (4). Ten items 
from Shoben’s scale were omitted, be- 
cause they represented a miscellaneous 
group which lacked homogeneity, and 
therefore were not as meaningful for this 
study as the other items. In all other 
respects the scale was identical with Sho- 
ben’s. It consisted of 75 statements about 
parent-child relationships to which stu- 
dents responded by strongly agreeing, 
mildly agreeing, mildly disagreeing, or 
strongly disagreeing. The scale measured 
three kinds of attitudes: dominating (40 
items), possessive (20 items), and ig- 
noring (15 items). 

Dominating attitudes “reflect the ten- 
dency on the part of the parent to put 
the child in a subordinate role, to take 
him into account quite fully but always 
as one who should conform completely to 
parental wishes under penalty of severe 
punishment” (4, p. 137). The following 
statements from the scale represent this 
kind of attitude: “A child should have 
strict discipline in order to develop a fine, 
strong character.” “It is sometimes neces- 
sary for the parent to break the child’s 
will.” 

Possessive attitudes “reflect the ten- 
dency to ‘baby’ the child, to emphasize 
unduly (from a mental hygiene view- 
point) the affectional bonds between par- 
ent and child, to value highly the child’s 
dependence on the parent, and to restrict 


the child’s activities to those which can 
be carried on in his family group” (4, p. 
137). Statements illustrating this attitude 
are: “The best child is one who shows 
lots of affection for his mother.” “Children 
should always be loyal to their parents 
above anyone else.” 

Ignoring attitudes reflect “the ten- 
dency on the part of the parent to disre- 
gard the child as an individual member of 
the family, to regard the ‘good’ child as 
one who demands the least parental time, 
and to disclaim responsibility for the 
child’s behavior” (4, p. 187). Sample 
statements representing this kind of at- 
titude are the following: “Parents cannot 
help it if their children are naughty.” 
“Quiet children are much nicer than little 
chatter-boxes.” 

All scoring of responses followed Sho- 
ben’s weighting system, a procedure he 
had found to result in parents of problem 
children making significantly higher 
scores, on the average, than parents of 
nonproblem children. Scores were inter- 
preted as follows: The higher the score, 
the more intense is the attitude; the lower 
the score, the less intense is the attitude. 


RESULTS 


As Table 1 indicates, the psychology 
students expressed a significant decrease 
in the intensity of their attitudes, as 
measured by the total scale. The mean 
change was 7.63 scale points. This shows 
that in general the attitudes of the psy- 
chology students toward parent-child re- 
lationships became more permissive by 
the end of the course. 

The psychology classes also changed 
significantly in each of the three attitude 
dimensions which made up the total scale. 
Table 1 shows that the greatest change 
was in dominating attitudes, with a mean 
decrease of 5 scale points. Changes in pos- 
sessive and ignoring attitudes were 


smaller, being 1.51 and 1.12 scale points 
respectively. 
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These changes were evidently the re- 
sult of taking child psychology, since an 
analysis of the sociology classes showed 
no significant change in attitudes. Table 
2 reveals this lack of change between the 
pretest and posttest scores. A comparison 
of the initial attitudes of the sociology 


ATTITUDES TOWARD PARENT- 
AND AFTER COURSE 
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and psychology students shows that both 
groups expressed essentially the same in- 
tensity of attitudes at the beginning of 
their respective courses. Only the psychol- 
ogy students, however, changed signifi- 
cantly. This fact supports the conclusion 
that the psychology students changed as 


TABLE 1 
CuiLtp RELATIONSHIPS BEFORE 
IN CHILD PSYCHOLOGY 


(N = 157) 
Before course After course ir 
Atti ——_—_——$—$— — _———_————_ L erence 
i. ttitude measured ean a TEA T in mesns t 
score score 
T | |, Or E a 
Dominating 154.55 | 11.23 149.55 | 10.18 5.00 7.46* 
Possessive 73.50 | 6-30 71.99 | 5.82 1.51 3.75* 
Ignoring 52.40 | 3.87 51.28 | 3.86 1.12 3.29* 
Total scale 280.45 | 16.30 272.82 | 15.93 7.63 7.00* 
k Note.—The higher the score, the more intense is the attitude. 
f *p <0. 
J TABLE 2 
ATTITUDES TOWARD PARENT-CHILD RELATIONSHIPS OF STUDENTS 
Wuo Dip Nor TAKE Course IN CHILD PsycHoLoGy 
(N = 155) 
After course 
Before course er Pieces ; 
Attitude measured Mean | sp Mean | sp in means 
score score 
CAP eee | P = 
Dominating 154.36 | 11.12 | 154.24 12.29 .12 17 
ossessive 74.44 66 | 74.21 5.04 +23 -56 
fnoring 51.41 | 4.45 51.58 | 4.27 Akg .45 
Total scale 280.05 | 16-23 | 280-21 16.74 -16 .14 
Note.—The higher the score, the more intense is the attitude. 
TABLE 3 
RELATIONSHIPS OF STUDENTS ACHIEVING IN 


PareNT-CHILD 


ATTITUDES TOWARD 
Lower HALVE 


(65 | 16.11 
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Note—The higher the score, the more intense is the 


*p< 01. 


s OF COURSE I 


N CHILD PsycHoLocy 


UPPER AND 
After course 
Before course (aaa Difference 
Mean in means 
ue SD score SD 
ea ere 0 5.84* 
268.54 | 14.46 | 7.3 ; 
Upper half (N = 79) 275.84 | 14.56 a 15.93 | 8.36 5.61* 


i is 
, wer half (N = 78) 
attitude. 
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TABLE 4 


MEN AND WOMEN STUDENTS’ ATTITUDES TOWARD PARENT-CHILD RELATIONSHIPS 
BEFORE AND AFTER COURSE IN CHILD PsycHoLocy 


Before course 


(total scale) 


After course 
(total scale) 


Difference t 
Mean SD Mean SD MANE 
score score 
Men (N = 65) 281.05 | 15.83 | 272.79 | 15.73 8.26 5.66* 
Won (N = 92) 280.14 | 16.50 | 273.02 | 15.91 7.12 4,98* 


Note.—The higher the score, the more intense is the attitude. 


*p< 0 


a result of taking a course in child psy- 
chology, and not simply because of gen- 
eral college experiences. 

In order to discover what relationship 
might exist between scholastic achieve- 
ment in child psychology and changes in 
attitudes, a comparison was made be- 
tween students achieving in the upper 
half of their class and those achieving in 
the lower half. The basis for comparison 
was scores obtained on objective examina- 
tions covering course content, including 
a comprehensive final examination. Re- 
sults of this analysis are summarized in 
Table 3. Initially the upper half of the 
class was more permissive than the lower 
half. Both groups changed significantly. 
The amount of change, however, for each 
group was the same. (The actual differ- 
ence of 1.06 has a ¢ value of 59.) 

Did men and women students differ in 
their attitude changes? The answer to 
this question can be found in Table 4. 
Initially, the attitudes of both groups 
were approximately the same. Both men 
and women showed a significant decrease 
in attitude intensity. The amount of 
change, however, for each group was the 
same. (The actual difference of 1.14 has 
a t value of .59.) 


Discussion 


At the conclusion of a course in child 
psychology, students expressed more per- 
missive attitudes toward parent-child re- 


lationships than they had held at the be- 


ginning of the semester. While they 


changed significantly in all three kinds of 
attitudes measured, their dominating at- 
titudes decreased the most. Why did the 
greatest change occur in that particular 
area? There are several factors which 
should be considered in answering this 
question. All of them, to some extent, 
probably played a part in effecting 
change. 


i 


First, it may be that for the kind of © 


population which made up the psychology 
classes, dominating attitudes are more 
susceptible to change through formal edu- 
cation than are possessive and ignoring 
attitudes. Most of these students were 
not very far removed in time and memory 
from their adolescence. Since problems of 
parental domination are so outstanding 
in typical adolescent-parent relationshijs, 
it may be that recent sensitization made 
these students more keenly aware of such 
problems. Thus they would be ready to 
react to any course content which de- 
picted the effects overdominating parents 
have on children, 

A second important factor is that the 
students may simply have examined more 
course content dealing with dominating 
attitudes, and thus were able to change 
more. Of course, this Possibility is closely 
related to the first. They may have en- 
countered more material dealing with pa- 
rental domination because they were more 


w 
-4 


| 


a- 
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ready to react to it, and therefore sought 
it out more often than information deal- 
ing with possessive and ignoring attitudes. 
(The instructor of the course is satisfied 
that neither his lectures nor assignments 
were more heavily weighted with material 
dealing with dominating attitudes than 
with other kinds of parent-child relation- 
ships.) 

A third likely reason for the greatest 
change being in dominating attitudes lies 
in the nature of the attitude scale. It will 
be recalled that the dominating area con- 
tained 40 items, as compared with 20 for 
possessive and 15 for ignoring attitudes. 
Thus, the dominating area of the scale 
afforded a wider range of possible re- 
sponses, and therefore more opportunity 
to express change. Furthermore, this part 
of the scale had a higher correlation with 
the total scale (.86) than did the other 
two attitude categories (4, p- 131). It is 
also significant to note that when Suoben 
compared the attitudes of parents of 
problem children with parents of non- 
problem children, he found the greatest 
difference in the two groups to be in 
dominating attitudes (4, P- 182). 

The fact that low achievers changed as 
much in their attitudes as high achievers 
confirms something teachers have strongly 
suspected for a long time: Grades don’t 
tell them all they would like to know 
avout what students obtain from @ 
course! However, had measurements of 
scholastic progress other than objective 
examinations been used, the relationship 
between change in attitude and achieve- 
ment might have been different. 

Evidently the course had the same ef- 
fect on the attitudes of men and women 
students, both groups becoming more Po 
missive to the same extent. This is a de- 
sirable outcome, since it is & commonplace 
that both parents play important parts m 
affecting their children’s development. 

Considering the great t of delib- 


amoun! : 
erate emphasis on parent-child relation- 
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ships built into the course, the investiga- 
tor had anticipated greater changes in 
attitudes than those discovered. But per- 
haps it is unrealistic to expect in one se- 
mester much more change than this. 
Then, too, these measured changes may 
be only the initial stimulation for greater 
changes which will take place later in the 
students’ development. 

In any case, however, studies like this 
one would probably be more meaningful 
if they measured more specific attitudes 
than those represented by Shoben’s three- 
fold classification. The investigator is now 
in the process of doing this by using an 
inventory recently developed by Schaefer 
and Bell (3). Their instrument is an im- 
provement over Shoben’s in several re- 
spects, an important one being that its 
attitude areas, delineated on the basis of 
factor analysis, are more specific and ho- 
mogeneous than the areas used by Shoben. 
It is hoped that others interested in meas- 
uring attitudinal outcomes of child psy- 
chology will also try out this instrument. 


SuMMARY 


Will an undergraduate course in child 
psychology change attitudes toward par- 
ent-child relationships? To answer this 
question, an attitude scale was adminis- 
tered to 157 students before and after 
the course. Postcourse attitudes were sig- 
nificantly less dominating, possessive, and 
ignoring. The greatest change was in dom- 
inating attitudes. Scholastic achievement 
in the course was not significantly related 
to the amount of change. Men and women 
did not differ significantly in how much 
they changed. As a control measure, the 
same scale was administered to 155 stu- 
dents before and after a course in intro- 
ductory sociology. No change in attitudes 


occurred. 
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Air Force Personnel and Training Research Center 


Though the Taylor-Spence anxiety- 
drive concept has received considerable 
attention in the experimental literature, 
there is a significant lack of information 
pertaining to the effect of this variable 
on psychometric performance. The thesis 
of this study, in accordance with evidence 
available in the experimental literature, is 
that manifest anxiety level may be shown 
to have a differential effect on various 
phychometric measures; and further, that 
this differential effect may be shown to 
vary with the number of alternatives 
available for decision-making, the amount 
of habit interference possible within the 
task. The anxiety-drive hypothesis essen- 
tially states that the higher the anxiety 
Score on the manifest anxiety scale (6), 
the stronger the excitatory potential and 
the greater the response strength. For ex- 
ample, a group scoring high on the scale 
has been demonstrated (3, 4, 5) to be 
Superior to a low scoring group in the 
amount of conditioning exhibited. In more 
complex learning situations, however, in 
Which there are a number of alternative 
competing tendencies, it has been shown 
(1) that the effect of increasing a drive 
Would depend upon the initial response 
hierarchy and the relative habit strength 
of the correct or goal attaining response in 
the hierarchy. Differences in drive level 
may lead to superior performance by 
either the anxious or the nonanxious group 
work done under 
Task No. 17011, 
and Development 
Personnel an 
Lackland Air 
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depending on the difficulty of the choice 
points. 

It is therefore assumed that there would 
be no significant difference in the per- 
formance of anxious and nonanxious Ss 
on simple speeded measures which pose 
problems not having alternative com- 
peting tendencies. However, when the test 
problems offer a minimal number of al- 
ternative competing tendencies and thus 
are parallel to the experimentalist’s con- 
ditioning problem in terms of possible 
habit interference, the performance of 
anxious Ss will be superior to that of non- 
anxious Ss. Furthermore, when the meas- 
ures become quite complex and each 
problem offers a number of alternative 
competing tendencies, the performance of 
nonanxious Ss will be superior to that of 


anxious Ss. 


METHOD 


Measures 


Four measures were selected to test the 
hypotheses. Answer Sheet Marking Test 
B1710AX was chosen as a simple repeti- 
tive task which does not bring forth any 
alternative competing choices for solution 
of the problems. The test is used to de- 
termine how quickly and accurately Ss 
can locate and mark designated responses 
on a 15-choice IBM answer sheet. There 
are two parts, each presenting 75 ran- 
domly distributed items. 

Army Clerical Speed ACS-2 was selected 
as a measure in which the test problems 
offer a minimal number of alternative 
competing tendencies. The test presents 
125 four- to seven-digit numbers paired 
with either their reversals or a reasonable 
approximation. The task is to state 
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whether an exact reversal of the stimulus 
number is presented. 

A Code Learning Test was devéioped 
to parallel the complex learning situations 
often used in experimental laboratories. 
The test utilizes a tape recorded and IBM 
answer sheets for group administration, 
and teaches the first 10 letters of Inter- 
national Morse Code by presenting each 
signal twice, testing at the end of the se- 
quence, and repeating the procedure 10 
times. Part scores were obtained for each 
of the 20-item subtests. All Ss reporting 
previous experience with the code were 

climinated from the experiment. 

A simple, nonspeeded arithmetic achieve- 
ment test was selected as a complex meas- 
ure of overlearning. California Achieve- 
ment Test II, a subtest 4, section D of 
parallel forms AA, BB, and CC were used. 
Each form presents 20 addition problems 
varying in complexity from the summation 
of: 2 two-digit to 4 four-digit whole num- 
bers, decimals, fractions and complex num- 
bers when presented in columnal and linear 
array. Scores were obtained for each paral- 
lel form of the test. 


Subjects 


All Ss were newly inducted basic airmen 
entering Lackland Air Force Base from 
November 29, 1955 to January 20, 1956. A 
total of 723 Ss were administered a modi- 
fied form of the Taylor scale which in- 
cluded the 50 anxiety items of the original 
scale plus the Lie and Manic scales from 
the Minnesota Multiphasic Personality In- 
ventory. The Lie scale was used to detect 
false scores, a score of eight or above 
eliminating an S from selection. The anx- 
ious and nonanxious groups each consisted 
of those whose scores fell respectively in 
the upper and lower 20% of scores for a 
standardization population of about 1000 
basic airmen. The cutoffs were 21 and 
higher, 10 and lower respectively on the 
manifest anxiety scale. There were 178 Ss 


in the anxious group and 159 Ss in the non- 
anxious group. 

Anxious and nonanxious Ss from each 
flight of approximately 60 men were pre- 
sented the above measures in the order 
described. 8 


RESULTS AND DISCUSSION 


The means and standard deviations for 
anxious and nonanxious groups on the 
Answer Sheet Marking and Army Clerical 
Tests are presented in Table 1. The anal- 
ysis of variance technique was used to 
test for significant differences between per- 
formances of the two groups on the com- 
plex learning and achievement measures, 
This information is presented in Table 2. 

In all cases the hypotheses were sub- 
stantiated. There was no significant differ- 
ence in the speed and accuracy with which 
anxious and nonanxious Ss could locate and 
blacken specified spaces on an IBM answer 
sheet. The mean performance score for 
anxious Ss was significantly higher than 
that of nonanxious Ss on the Army Clerical 
Speed Test. On the more complex activ- 
ities, the Code Learning Test and the 
arithmetic achievement tests, the m 
performance scores of nonanxious Ss Sep 
a higher than those of anxious 

Performance on the Army Clerical Speed 
Test parallels the results obtained with 
conditioning problems in the experimental 
laboratories. On the more complex ac- 
tivities, the code learning and arithmetic 
achievement tests, performance parallels 
the results obtained with more complex 
learning situations in the experimental 
laboratories. Since these results parallel 
those found with conditioning and complex 
learning situations, the appropriate place- 
ment of the control task on a complexity 
continuum is open to question. The amount 
of habit interference possible within this 
task does not seem as great as within the 
Army Clerical Speed Test. Yet perform- 
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TABLE 1 
MEAN PERFORMANCE Scores 
Groups 
Measure Anxious (N = 178) |Nonanxious (N = 159) t 
Mn SD Mn SD 
Answer Sheet Marking 109.51 23.92 112.04 23.24 .98 
Army Clerical Speed 70.70 | 27.75 63.34 | 26.47 2.48* 
* Significant at the .02 level. 
TABLE 2 


or VARIANCE FOR TESTING SIGNIFICANT DIFFERENCES 


ANALYSIS 
BETWEEN PERFORMANCE OF THE Two GROUPS 
Code Learning Tests Calif. Achiev. Tests 
Source df Variance df Variance 
Betw 1  253.09* 1 114.08* 
Batman baa 9 1306.94" 2 130.99* 
Between individuals within groups s et oF see 
Groups by trial interaction ak ae ai y 


Residual 


* Significant at the .01 level. 


ance on the Army Clerical Speed Test 
Paralleled the results of conditioning °x- 
periments, Either the control task is mis- 
classified on a complexity continuum oF 
the Taylor Scale does not function as a 
measure of drive in this situation. A study 
has been initiated to provide the answer 
to this question. 
Previous studies have shown that an 
extremely high scoring group on & mani- 
fest anxiety scale is constantly superior to 
a low scoring group in the amount of con- 
ditioning exhibited but that in more com- 
plex learning situations a high drive level 
could result in an impairment of perform- 
ance. In this study a control task was also 
introduced, a situation in which there was 
No significant difference between the per- 
formance of anxious and nonanxious Ss. 
Thus the differential effect of mamii 
anxiety on performance was shown wit 
Psychometrie variables to be a function 
of the number of alternatives for decision- 


making given to S. Since results were ob- 
tained which are consistent with those 
found in experimental literature, the study 
has demonstrated a parallelism between 
psychometric measures and some types of 
variables studied in experimental labora- 
tories. 
SUMMARY 


Groups of anxious and nonanxious Ss, as 
determined by a scale of manifest anxiety, 
were administered psychometric tasks 
which varied in the number of alternatives 
for decision-making given to S. There was 
no significant difference in the performance 
of the two groups on a control measure 
which did not provide S with alternatives 
for decision-making. When the task pro- 
vided a minimal number of alternatives, 
mean performance of the anxious group 
was superior. When the task was complex 
and a number of alternatives were made 
available, mean performance of the non- 
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anxious group was superior. Results were 
obtained which are consistent with those 
found in the experimental literature, thus 
demonstrating a parallelism between psy- 
chometric measures and some types of 
variables studied in experimental labora- 
tories. 
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It is usually agreed that the socio- 
economic status of the family and com- 
munity in which children and youth grow 
up affects their learning behavior and 
school achievement (eg, 2, 3, 5, 10). 
Davis, for example, explains the influence 
of social class in these words: “By de- 
fining the group with which an individual 
may have intimate clique relationships, 
our social class system narrows his train- 
ing environment. His social instigations 
and goals, his symbolic world and its 
evaluation are largely selected from the 
narrow culture of that class with which he 
can associate freely” (2, p. 609). 

In the present study, academic aptitude 
of students in Illinois high schools is com- 
pared with some of the socioeconomic 
characteristics of various communities in 
which the schools are located. Variables 
chosen for this purpose were those for 
which there was ready-to-hand informa- 
tion and which tended to “narrow the 
culture” of the people in the community. 
They are the distance from large towns 
and cities, community population, dis- 
tance from the nearest active coal mine, 
the value of farm products sold, the value 
of land and building per farm, the school 
size, and whether the school is public or 
Private. Obviously, these are not direct 
measures of the psychological effects of 
family and community but rather are 
Tough indices which we presume will be as- 
sociated with psychological processes which 
in turn are related to academic aptitude. 
While these variables are not of the kind 
to please the theoretician, they have the 
advantage for the practical worker that 
the necessary information is readily ayail- 
able. 

Although this study was made only of 
schools within the state of Illinois, there 1s 
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evidence that the relationships reported 
here would be typical of those throughout 
the nation. In a nationwide sampling of 
high schools, Mollenkopf and Melville (7) 
found that academic aptitude and achieve- 
ment were related to such variables as re- 
gion (South or non-South), community 
size, percentage of fathers who were high 
school graduates, instructional support per 
pupil, percentage of support from state 
aid, and whether the region served by a 
school had a public library. 

Somewhat similar results were obtained 
by Thorndike (8) in his study of com- 
munity variables as predictors of intel- 
ligence and academic achievement. Cor- 
relations between IQ and various indices of 
the education of adult population ranged 
from .33 to .43 and between IQ and differ- 
ent measures of cost of housing from .30 
to .32. A correlation of .28 was obtained 
between IQ and proportion of native-born 
whites, one of —.26 between IQ and rate 
of female employment, and one of .28 
between IQ and frequency of professional 
workers in community. 

In the present study, the academic ap- 
titude of the students was measured by 
the portion of the Differential Aptitude 
Tests (1) which is used as a part of the 
Illinois Statewide High School Testing 
Program. The score used is the Total 
score from the Abstract Reasoning and 
Verbal Reasoning Tests. This score will be 
referred to as the “D.A.T.” in this paper. 
We will also talk of the Illinois Statewide 
Testing Program as the “Program.” 

From the 1955-56 Program we have 
figures available showing the mean D.A.T. 
for each school computed from the per- 


1 This prepublication memorandum is 
cited with the kind permission of the au- 
thors. 
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centile scores of the pupils in that school. 
Using this information, we chose as our 
“dependent” variable an extreme dichot- 
omy; namely, whether the school mean 
lies in the top quarter of all schools or in 
the bottom quarter. The second and third 
quarters are ignored? It should be re- 
membered that not all high schools in the 
state appear in the Program and corres- 
pondingly in this analysis. The schools in- 
cluded are those participating in the State- 
wide Testing Program and having school 
means on the D.A.T. falling either in the 
highest or the lowest quarters. 


Hyporuesis 


The purpose of this study was to check 
some rough indices of the socioeconomic 
characters of different communities in the 
state of Illinois (or of the schools which 
are located in those communities) against 
the mean academic aptitude of the stu- 
dents sent to schools in those communities. 
Our general hypothesis is that some of the 
factors which determine aptitudes in 
schools in various communities are to be 
found among the socioeconomic character- 
istics of those communities. We would 
suppose that, while some of this relation- 
ship is due to the efficacy with which cer- 
tain types of social situations foster the 
growth of academic ability, another part 
of the relationship is due to the self-selec- 
tion of the kinds of families which move 
into the various kinds of communities. 
Our prediction in each case below stems 
from the expectation that the more a com- 
munity is able to support its schools, the 
greater are the chances that the students 
will show the higher ranges of aptitude. 


? Since cases falling in the second and 
third quarters are not examined, conclu- 
sions cannot be drawn about these quarters. 
We can only draw conclusions about whether 
a variable (such as distance from larger 
towns) is associated with the school’s fall- 
ing in the first or fourth quarter. 
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TABLE 1 
Mean D.A.T. Versus DISTANCE FROM 
Nearest Town or 25,000 OR LARGER 
(N = 167) 


Quarter of mean 
Distance of schools from | D.A.T. of school 


towns of 25,000 or larger 


First | Fourth 
Less than 10 miles 47 12 
Over 10 to 25 miles 35 20 
More than 25 miles 12 41 


Note.—p < .001 by chi square for 2 df. 


Frypincs 


A description of the socioeconomic meas- 
ures used in the study and the results 
obtained in each case follow. 


Distance from Town of 25,000 or Larger 


First examined was the distance of each 
school from the nearest town of 25,000 
or more in population. These distances 
were sealed from an ordinary current high- « 
way map. Communities farther from the 
larger towns, we felt, would be more rural 
in character. Table 1 shows the distri- 
bution of schools in the first and last 
quarters of the mean D.A.T. scores ac- 
cording to their distance from the nearest 
town of 25,000 or more. The mean dis- 
tance from such towns is 12 miles for the 
schools in the first (highest) quarter of 
mean D.A.T. scores and 29 miles for the 
schools in the fourth (lowest) quarter. 
The chi-square test applied to the data 
of Table 1 gives a value significant at the 
001 level and Supports the prediction 
made. 

It should be noted, in regard to Table Į 
and all other tables, that we chose the 
categories of the independent variable so 


* We report in each case the p value which 
is the smallest the table lists for the ob- 
tained value of chi square. We report these 
values so that the reader may choose for 
himself the level of probability which he 
wishes to consider statistically significant. 


X 


| 
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as to keep the frequencies in each category 
as equal as might be while at the same time 
avoiding expected frequencies too small to 
satisfy the requirements of the chi-square 
test. 


Community Size 


A second variable used was the popu- 
lation of the communities in which the 
schools with higher and lower D.A.T. 
means were located. The assumption was 
that larger centers of population would 
be able to spend more money on their 
schools than the smaller ones. The relation 
was examined twice: once including schools 
in the Chicago metropolitan area and a 
second time excluding them. Population 
figures were taken from the highway map. 
Towns not listed on the map were assumed 
to have populations under 1,000. As shown 
by Tables 2(a) and 2(b), the expected re- 
lation is significant at the .001 level 
whether or not Chicago schools are in- 
cluded. 


Distance from Nearest Active Coal Mine 


The next prediction was that proximity 
to coal mines would tend to lower the 
mean aptitude to be found in the school. 
Table 3 shows the distribution of schools 
in the first and fourth D.A.T. quarters ac- 
cording to their distances from the near- 
est active coal mine. This variable was 
chosen because a great number of people, 
especially in the southern part of the state, 
make their living through working in the 
coal mines, Some of the farmers work in 
them to earn “extra” money. Thus, exist- 
ence of coal mines becomes an important 
factor in shaping the economy of an area 
and in deciding the kind of population in- 
habiting certain localities, particularly in 
the southern half of the state. The appro- 
Priate information was taken from a map 
of mineral industries (4). The relation 
Predicted is significant at the 001 level. 


TABLE 2 
Mean D.A.T. Versus SIZE or COMMUNITY 
F 
Population of the commu- 


nity where schools are 
located 


Quarter of mean 
D.A.T. of school 


First | Fourth 


(a) Including Chicago Schools* (N = 167) 


Under 1,000 14 33 
1,001-5,000 25 28 
5,001-25,000 27 10 
Over 25,000 28 2 


(b) Excluding Chicago Schools? (N = 144) 


Under 1,000 14 33 
1,001-5,000 25 28 
Over 5,000 33 ll 


® p < .001 by chi square for 3 df. 
b p < .001 by chi square for 2 df. 


TABLE 3 
Mean D.A.T. Versus DISTANCE FROM 
Naarest Active Coat MINE 
(N = 167) 


Quarter of mean 
Distance from nearest active) D.A.T. of school 
coal mine 


First | Fourth 
Less than 20 miles 16 31 
Over 20-35 miles 18 20 
Over 35-50 miles 33 11 
Over 50 miles 27 11 


Note.— p < .001 by chi square for 3 df. 


Value of the Farm Products Sold in the 


County 

The value of the farm products sold in 
different counties of the state was used 
as an index of the income of the residents 
of a given locality. The analysis was made 
separately for counties containing towns 
of 25,000 or over and for counties not 
containing such towns in order to find out 
whether there was any difference between 
the two kinds of counties in this respect. 
It was thought that where value of farm 
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products sold was higher, urban and rural 
areas would both prosper and property 
values would be high in general. It was 


TABLE 4 
Mean D.A.T. Versus VALUE or FARM 
Propucts SoLD 


Quarter of mean 

Value of farm products sold | D.A.T. of school 
in millions of dollars ee 
First | Fourth 


(a) Counties Containing Towns of 25,000 or 
Larger* (N = 68) 


0-14.99 5 7 
15.0-24.99 30 4 
25.0 and above 17 5 


(b) Counties Not Containing Towns of 
25,000 or Larger? (V = 99) 


0-14.99 15 29 
15.0-24,99 12 20 
25.0 and over 15 8 


“ p < .01 by chi square for 2 df. 
» p < .05 by chi square for 2 df. 


m TABLE 5 
D.A.T. VERSUS AVERAGE VALUE or 
‘AND AND BUILDINGS PER FARM 


Quarter of mean 


Value of land and buildings D.A.T. of school 


per farm in thousands of 
dollars 


First | Fourth 


(a) Counties Containing Towns of 25,000 or 
Larger (N = 68) 


Less than 45.0 31 
More than 45.0 21 


(b) Counties Not Containing Towns of 
25,000 or Larger? (N = 99) 


Less than 45.0 17 32 
More than 45.0 25 25 


ap > .10. 
bp >10. 
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expected that schools with the higher 
D.A.T. means would be located in com- 
munities which showed the greater values 
of farm products sold during the year. 
Values were taken from the 1954 U. S. 
Census of Agriculture (9). Tables 4(a) 
and 4(b) present the relevant data. The 
expected relation was significant for both 
categories, i.e., counties with and without 
larger towns, beyond the .05 level. 


Value of Land and Building Per Farm 


As a second index of income and there- 
fore of the ability of rural people to give 
financial support to their schools, the 
average value of land and buildings per 
farm was used. This likewise was computed 
separately for counties containing towns 
of 25,000 or over and counties not con- 
taining such towns. The information was 
taken from the 1954 U. S. Census of Agri- 
culture (9). Tables 5(a) and 5(b) con- 
tain the data relevant to this question. The 
partitioning of chi square (6) was used 
to test the significance of the relations 
among the variables involved; i.e., quarter 
of D.A.T., average value of land and 
buildings per farm, and whether or not 
the county contained any town of 25,000 
and over. The only significant relation— 
at the .001 level—was that between quar- 
ter and location in counties with towns of 
25,000 or over. This latter relation is of 
little interest here since it is only another 
form of the relation between mean D.A.T. 
and the distance from larger towns. In 
brief, our expectation was not borne out 
that the mean D.A.T. of the school would 
be related to the value of farm land and 
buildings in the county. We have no 
ready explanation of the reason that this 
variable should fail to show a significant 
relation while our other variables, just as 
imprecise, do so. 


Size of School 


Another variable used was the size of 
the school itself. This information is avail- 
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TABLE 6 
Mean D.A.T. Versus ScHooL S1zE* 
(N = 138) 


Quarter of mean 


Number of pupils in school D.A.T. in school 


Fourth 


First 
1-99 5 23 
100-299 28 36 
300-499 13 8 
500-999 11 4 
More than 1,000 10 0 


Note.—The information concerning this variable was 
lacking for 29 schools in the Program, 

* p < 001 for chi square for 4 df, (computed with the 
Yates correction for continuity) 


TABLE 7 


Mean D.A.T, Versus PUBIC- 
PRIVATE VARIABLE 
(N = 167) 


Quarter of mean 
D.A.T. of school 


Public -private variable 


First | Fourth 
Public schools 66 a 
Private schools 28 2 


p < .001 by chi square for 1 df. 


able from the files of the Illinois Statewide 
High School Testing Program. The pre- 
diction was that schools in the first quar- 
ter would be significantly larger in size 
than the schools in the fourth quarter. 
Table 6 contains the data relevant to this 
point, The prediction is supported at the 
.001 level. 


The Public-Private Variable 


This measure was used as another varia- 
ble which might correlate with mean ap- 
titude in the school. It was predicted that 
the ratio of private schools to public 
schools would be significantly larger in the 
first quarter of D.A.T. scores than in the 
fourth quarter. Here one would suppose 
the effect to be due largely to the selec- 
tiveness of the private school. Table 7 pre- 


sents the relevant data. The frequencies in 
Table 7 seem at first glance very “lop- 
sided,” The reason for this, of course, is 
that so many more public than private 
schools participate in the Illinois State- 
wide Testing Program. Nevertheless, the 
chi-square value for the relation in Table 
7 is significant at the .001 level. 


SUMMARY AND CONCLUSION 


The purpose of the present study was 
to investigate some socioeconomic cor- 
relates of academic aptitude. The study 
was intended to be exploratory rather 
than precise. Academie aptitude was ex- 
amined by using schools whose mean test 
scores were in the upper or lower quarters 
among schools participating in the State- 
wide Testing Program of the University 
of Illinois. Findings of the study were that 
the mean academic aptitude of the school 
was strongly related to (a) the distance 
from larger towns, (b) the community pop- 
ulation, (c) the distance from active coal 
mines, (d) the value of farm products sold 
in the county where the school is located, 
(e) the size of the school, and (f) whether 
the school is publie or private. 
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DIFFERENTIAL RETENTION OF COURSE OUTCOMES 
IN EDUCATIONAL PSYCHOLOGY 


WILLIAM P. MCDOUGALL 
Washington State College 


One of the paramount problems of all 
educational endeavor is that of making the 
learning experiences of students more 
lasting. Though the problem of retention 
has been studied in many different school 
subjects, relatively little research has been 
reported dealing directly with the per- 
manency of different kinds of course out- 
comes, The need for such evidence is 
suggested by the following quotation from 
the Taxonomy of Educational Objectives. 


For the most part research on the prob- 
lems in retention, growth and transfer has 
not been very specific with respect to the 
particular behavior involved. Thus, we are 
not usually able to determine from this re- 
search whether one kind of behavior 1s Te- 
tained for a longer period of time than 
another or which kinds of educative ex- 
periences are most efficient in producing a 
Particular kind of behavior. Many claims 
_ have been made for different educational 

procedures, particularly in relation to per- 
manence of learning; but seldom have these 
Peen buttressed by research findings (2, p. 

). 


this study to 


It was the purpose of 
t course out- 


measure retention of differen Í 
comes in a beginning course in educational 
psychology. The outcomes examined in- 
cluded: (a) knowledge and the intellec- 
tual abilities and skills, (b) translation, 
(c) interpretation, and (d) extrapolation. 
These objectives were defined by the 
Taxonomy of Educational Objectives (2), 
a handbook consisting of 2 logical and 
psychological classification of e 


ducational i 


$, 


goals. This handbook enables test con- 
structors to define very clearly the classes 
of behavior being measured in that it 
provides extensive definitions together 
with examples of test situations measuring 
the various behavioral objectives. 


PROCEDURE 


The general plan of the study involved 
the construction of tests to measure a 
variety of educational objectives in a 
beginning course in educational psychology 
at the University of Nebraska. The course, 
Human Behavior and Development, is the 
second of a two-course sequence taken by 
teacher trainees. It encompasses primarily 
the content areas of learning and evalua- 
tion. For this study, the content con- 
sidered was delimited to the materials 
studied about tests and measurements in 
order to permit more intensive and uni- 
form sampling of the objectives. 

The tests were related to the course by 
using the course syllabus and accompany- 
ing references which were used by all 
instructors teaching the various sections 
of the course. For each of the objectives 
tested, a few examples of items patterned 
after the “Taxonomy” definitions follow: 


Knowledge 


1. Which of the following is most easily 
measured by a test: (a) problem-solving 
ability, (b) study skills, (c) factual infor- 


j matien, (d) ability to comprehend. 
j £72..Which of the following is an individual 


intelligence: Wes}: (a) Guraria est of 
`: fsearey 


9 COE. cag 
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Mental Maturity, (b) Stanford Binet, (c) 
Ohio State Psychological Test, (d) Primary 
Mental Abilities. 

3. A test that places minor emphasis on 
the time limit is called a: (a) diagnostic 
test, (b) performance test, (c) survey test, 
(d) power test. 

4. Which of the following would be of 
most value in determining the typical be- 
havior of a student: (a) observation, (b) 
projective testing, (c) individual intelligence 
testing, (d) school achievement records, 

Item 1 is designed to measure knowledge 
of specific fact, Item 2, knowledge of a 
classification, Item 3, knowledge of termi- 
nology, and Item 4, knowledge of method- 
ology. 


Translation 


1. A major use of testing is for diagnosis. 
Which of the following test situations rep- 
resents the best example of the foregoing 
statement? (a) a comprehensive achieve- 
ment battery at the end of high school, (b) 
an achievement battery given early in the 
year, (c) an intelligence test, (d) a series 
of tests used to determine a student’s grade. 

2. If Bill scored at the 88th percentile in 
Social Service on the Kuder Preference 
Test, it would indicate that: (a) Bill got 
88% of the answers correct, (b) he has more 
ability in Social Service than 88% of his 
norm group, (c) only 12% of the norm group 
showed more interest in Social Service than 
he did, (d) that 88 out of 100 will do better 
than he did on this test, 

The first exercise involves translation of 
a formal statement by requiring the student 
to identify a concrete example. The second 
item involves the translation of quantitative 
data to its corresponding verbal meaning. 


Interpretation 


Data are given below on five pupils en- 
rolled in a class of 30 ninth graders. The 
test data are based on performance at the 
end of the first semester. Read over the 
summary and then show which pupil each 
statement best fits by marking the pupil’s 
number on the answer sheet. 


Teacher's 
Calif. Ach. Estimate 
Test Per- of Ach, 
formance Rank in 
Pupil JIQ Arith. Read. Lang. Class 
1 88 9.1 8.0 8.3 20 
2 99 9.7 9.6 9.5 14 
3 132 9.5 9.8 10.2 12 
4 138 11.8 12.3 12.0 3 
5 101 10.0 10.1 10.9 4 
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1. The pupil who should be doing con- 
siderably better in his school achievement. 

2. The accuracy of the IQ seems most 
doubtful in which case? 

3. A bright student making good use of 
his ability. 

4. Teacher regards 
according to test results 

5. Teacher’s rank 
test scores, 

Each of the foregoing situations involves 
the ability to deal with a configuration of 
ideas or data recognizing the relationship 
and relative importance of each. The infer- 
ences or generalizations made from the data 
do not extend beyond the data but are con- 
fined to the material presented. 


abilities too highly 


most consistent with 


Extrapolation 


The five students for whom the data are 
given below are in kindergarten. These test 
data are based on test performance at the 
beginning of the second semester. After 
examining the data, indicate which pupil 
best fits each of the following statements by 
marking the number of the student on the 
answer sheet. 


Percentile 
MA"on Rank on 
Stanford Readiness 
Student CA Binet Test 
1 5-10 7-4 72 
2 6-4 54 22 
3 5-10 5-5 64 
4 5-8 5-6 45 
5 5-6 6-10 38 
Which student: 


1. Is apparently in need of stimulating 
experiences but has fairly high aptitude? 

2. Apparently comes from a very stim- 
ulating environment? 

3. Is most characteristic of the average 
for this group? 

4. Can you predict will have the lowest 
ability three years from this time? 

The first two situations require the stu- 
dent to extend the implications of the data 
to another topic or situation, The third 
situation requires extension from a sample 
to a universe. The last item involves time 
dimension and requires prediction on the 
basis of the data presented. 


The tests were then administered on a 
trial basis to a group of 75 educational 
psychology students who had completed 
units on tests and measurements. The 


e 
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tests were then analyzed, refined, and used 
as instruments to study the retention of 
the different course outcomes. The re- 
fined tests were given as a pretest, a test 
at the completion of the course, and a re- 
test approximately four months later. 
There were 301 students who took both 
the pretest and the test, and 172 of this 
group took the retest. This latter group 
was used in the study of retention. 

The appropriateness of the tests in- 
volved in the study was examined after 
the test was administered to the trial 
test group and again after the test had 
been revised. 

The original trial tests contained ap- 
proximately 30 items measuring each ob- 
jective. The curricular validity of items 
was established by agreement among sev- 
eral instructors teaching the course in- 
volved in the study. Pooled judgment of 
several instructors was also used to assure 
that each item was correctly matched with 
the corresponding “Taxonomy” definition. 
The items were then studied after admin- 
istration to the trial test group. Item dif- 
ficulty and item discrimination were de- 
termined and substandard items were 
dropped or revised. Evidence of ambigu- 
ity in items and ineffective distractors 
were also studied and many items were 
revised or eliminated on this basis. The 
resulting refined tests contained approx- 
imately 24 items each, and when combined 
required approximately one and one-half 
to two hours for administration. k 

The tests were studied again at the time 
of the second testing in the retention ex- 
periment. At this time, 310 people took 
the test as part of their course final ex- 
amination. Item difficulty, item diserim- 
ination, and test reliability were deter- 
mined. Homogeneity of behavior measured 
by the different tests was studied in two 
ways: First, the correlations of the items 
with their respective test totals were com- 
pared with item correlations using the 
total of the four tests combined as 2 eri- 
terion. Second, an F test for departure 


from homogenity proposed by Neidt (10, 
p. 390) was applied. This latter technique 
indisated whether or not there is a rela- 
tively greater lack of homogeneity be- 
tween or among areas than within areas 
measured. The semiexternal criterion of 
course marks was correlated with the 
scores to further establish test validity. 
The correlation coefficients between each 
test and a measure of scholastic aptitude, 
the L score on the American Council 
on Education Psychological Examination, 
was computed to determine the degree to 
which verbal ability was present in each 
of these tests. 

In the study of retention, the suita- 
bility of the sample of 172 students who 
took the retest was determined by com- 
paring the performance of this group with 
the performance of the group who did 
not take the retest. The degree of relation- 
ship between the scores on each test ad- 
ministration was found by computing 
correlation coefficients between the pre- 
test and the test, the test and retest, and 
the pretest and retest. The differences 
between the means of scores on each of 
the test administrations was determined 
and tested for significance. Retention was 
then studied by computing the average 
percentage of gain retained for each of the 
separate objectives measured. 


RESULTS 


Analysis of the Tests 


The test item difficulty, reported in 
terms of percentage of the group who 
responded correctly to the item, was de- 
termined for all tests. The mean level of 
diffculty for the knowledge test was 
62.13%. The means for translation, inter- 
pretation, and extrapolation were 60.45, 
60.61, and 56.52, respectively. The indi- 
vidual difficulty percentages tended to 
cluster about the means and seemed to be 
well distributed with no items either being 
answered correctly or missed by 100% of 


the group. 


56 


Item discrimination was determined by 
correlating each item with the total test 
score. To obtain these correlations the 
upper and lower 27% of the distribution 
are designated as the criterion variable, 
and by entering the appropriate percent- 
ages in an item analysis table (4), the 
correlations may be estimated. Such cor- 
relations indicate the tendency for stu- 
dents who make high scores on the total 
test to mark the individual item correctly. 

On the combined tests, 54% of the total 
items were found to yield correlations of 
40 or above. Twenty-nine per cent of the 
total items were between .20 and .30. 
Only fifteen, or 17% of the total items, 
yielded correlations of less than 20. Two 
items were found to yield negative cor- 
relations and were eliminated from use in 
the retention study. The above percent- 
ages were fairly characteristic of all of 
the tests with slightly more low-correla- 
tion items in the knowledge and trans- 
lation tests than in the interpretation and 
extrapolation tests. 

The Spearman-Brown and the Kuder- 
Richardson estimates of reliability are 
shown in Table 1, 

Apparently the small number of items 
included in each test is the major reason 
for the somewhat low reliabilities, For 
evaluating the level of group accomplish- 
ment, such reliabilities may be regarded 


TABLE 1 


SpparmMan-Brown AND KUDER-RICHARDSON 
ESTIMATES OF RELIABILITY 


Odd- | Spear- | Kuder- 
Test pn even aan Richard- 
; corre-| Brown son 
items | ation Estimate|Estimate 
Knowledge | 24 | .297 | .458 -495 
Transla- 22 | .290 | .450 -507 
tion 
Interpre- 23 477 -646 531 
tation 
Extrapola- 24 | .440| .611 537 
tion 
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as acceptable according to some sources 
(8, p. 609). Certainly higher reliabilities 
would be more desirable, but in this ex- 
periment the limiting factor of testing 
time would have made it extremely difficut 
to include more items in the tests. 

One positive indication of homogeneity 
of the behavior measured by the different 
tests can be obtained by comparing the 
individual item correlations when using 
the respective test scores as a criterion 
with those obtained by using the total 
scores of the four tests combined as a cri- 
terion. Since the total score constitutes 
the criterion with which the item is com- 
pared, the higher the correlation the more 
the behavior measured by each item is 
like the behavior measured by the total 
test. It was noted that when all of the 
tests were combined into one single test 
score and the items correlated with this 
total, most of the correlations were re- 
duced. This reduction would indicate a 
greater heterogeneity of test content when 
tests were combined or, conversely, a 
greater homogeneity of content in the 
Separate tests. It was not possible by this 
method, however, to determine the degree 
to which each test is homogeneous with 
Tespect to each other test. To test this 
hypothesis, an F test for departure from 
homogeneity was applied. Intra- and 
interarea correlations were obtained and 
averaged according to the function A 
loge (1 + r)/(1 — r) as necessary for sub- 
stitution into the formula for computing 
the F values which is: 


_ltiv— 2% 


l-ir 


F 


where Fẹ is the average intra-area coef- 
ficient and 7, is the average interarea 
coefficient of correlation. The resulting 
F values are shown in Table 2. Inspection 
of Table 2 shows that the resulting F 
values are significant beyond the 1% level 
of confidence between knowledge and 
translation, knowledge and interpretation, 
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and translation and extrapolation. The F 
value for translation and interpretation 
is significant at the 5% level. The F values 
between knowledge and extrapolation and 
interpretation and extrapolation are not 
significant, although the first of these ap- 
proaches significance. The hypothesis that 
behaviors measured by the different tests 
were homogeneous with respect to each 
other can be rejected between all tests ex- 
cept knowledge and extrapolation and in- 
terpretation and extrapolation. 

On the basis of these results the inter- 
pretation and extrapolation tests were 
combined since they did not seem to be 
performing separate functions. The re- 
sulting F values with these two tests com- 
bined are shown in Table 3. 

Inspection of Table 3 reveals that all 
values of F are significant beyond the 1% 
level of confidence. The hypothesis that 
these three tests measure behaviors homo- 
geneous with respect to each other is 
rejected. The remainder of the experi- 
ment considered interpretation and ex- 
trapolation as a single test. The resulting 
Spearman-Brown estimate of reliability 
for this test would become .773. 

A semiexternal criterion, namely final 
course marks, was employed to obtain a 
measure of empirical validity. The result- 
ing correlations between the tests and 
final grades centered about .60, demon- 
strating a high positive relationship using 
such a criterion. These correlations would 
be spurious to the extent that the tests 
used in the experiment consituted as much 
as one-sixth of the final grade. 

The correlations between the L score of 
the American Council on Education Psy- 
chological Examination and each test are 


as follows: 
Tr 


Test 
Knowledge .364 
ranslation 3862 
Interpretation-Extrapola- .343 


tion 
It is evident from the inspection of these 


TABLE 2 


VALUES or F ror Tests or HOMOGENEITY 
BETWEEN TEsts 


Test Trans-| Inter- Extra- 

lation | pretation | polation 

Knowledge 1.388 | 1.332 1.178 
Translation 1.310 1.423 
Interpretation 1.005 


Note.—Required for significance, 309 and 309 degrees 


of freedom, 1% = 1.33 
5% = 1.22 


TABLE 3 


VALUES or F ror Tests or HOMOGENEITY 
Between Tests 


(INTERPRETATION-EXTRAPOLATION 


COMBINED) 
Interpreta- 
Test Translation | tion-Extra- 
polation, 
Knowledge 1.388 1.358 
Translation 1.489 


Note—Required for significance, 309 and 309 degrees 


of freedom, 1% = 1.33 
5% = 1.22 


coefficients that the influence of the scho- 
lastic aptitude factor as measured by the 
L score is equally present in the perform- 
ance required by the different tests. 


The Study of Retention 


The differences on scores of the 172 
students who took the retest and those 
who did not were determined and tests of 
significance applied. It was established 
that this sample was characteristic of 
the population of 301 from which it was 
taken. 

The possibility that subject matter 
learned in other courses might transfer 
was also considered. It was discovered 
that none of the students who participated 
took courses during the retention period 
that dealt systematically with the area of 
tests and measurements. Apparently it 
was safe to conclude that only incidental 
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amounts of transfer, if any, would be ex- 
pected. 

The relationship between the three ad- 
ministrations of each test was studied. 
The resulting correlation coefficients were 
positive in all cases but not to a high 
degree. The values ranged from 274 to 
599 and averaged about .44. Such corre- 
lations indicated that individuals tended 
to maintain their relative rank on the 
successive test administrations. The cor- 
relations were slightly higher for the inter- 
pretation-extrapolation test than for the 
others, with an average correlation of 
542. 

To determine if the differences in mean 
performance on the various test adminis- 
trations were significant, a ¢ test for 
correlated data was applied. In Table 4 
the differences, together with the ac- 
companying ¢ values, are shown. 

It may be noted from inspection of 
Table 4 that all of these differences are 
significant at the 1% level except the 
difference between the pretest and retest 
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for the interpretation-extrapolation test. 
This difference is significant at the 2% 
level. These results show that, on the aver- 
age, a significant amount of material was 
learned during the instruction period, a 
significant amount forgotten during the 
four-month retention period and at the 
end of the retention period the students 
still retained enough learning so that their 
performance was significantly different 
from that at the time of the pretest. 
The amount of material retained for 
each test may also be reported in terms 
of percentage of gain retained. These 
percentages are also reported in Table 4. 
To determine if the differences between 
the percentages were significant, a ¢ test 
for correlated data was applied. The dif- 
ference of .78% between knowledge and 
translation yielded a ¢ value of .222 which 
is not significant. The percentage differ- 
ence between knowledge and interpreta- 
tion-extrapolation was 6.55 with an ac- 
companying ¢ of 1.985 which is significant 
at the 5% level. The difference of 5.77% 


Test 
Knowledge Translation Interpretation- 
se es xtrapolatio; 
N = 172) | (= 172) yo 
Pretest Mean (Mp) 11.90 11.05 23.13 
Test Mean (Mr) 15.55 13.83 28.31 
Retest Mean (Mr) 14.55 13.09 27.23 
Mr — Mp (Gain) 3.65 2.78 5.18 
F 10.42 13.29 12.33 
Mr — Mr (Loss) 1.00 -74 1.08 
t 3.33 2.74 2.57 
Mr — Mr (Gain Retained) 2.65 2.04 4.10 
t 10.19 8.16 9.11 
Mr — M i z 
ae (Percent of Gain Retained) % = 72.60 | % = 73.38 | o = 79.15 


Note.—Required for Significance, 171 Degrees of Freedom, 1% = 2.58 


2% = 
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between translation and interpretation- 
extrapolation yielded a ¢ of 1.748 which 
is significant at the 10% level of confi- 
dence. 

The course of learning and retention for 
each behavior studied may also be ex- 
pressed in terms of percentage of items 
answered correctly at each testing. 

The greatest gain and relatively the 
greatest loss were made on the knowledge 
test, the percentage of items correct in- 
creasing during the course from 49.6% 
to 64.8% and dropping off to 60.6%. The 
average percentage of items correct for 
translation began with 50.2%, increased to 
62.4% and dropped to 59.5%. The cor- 
responding percentages for interpretation- 
extrapolation were 49.2%, 60.2%, and 
57.9%. 


Discussion 


The results of this study indicate the 
need for carefully delineated course ob- 
jectives. The homogeneity analysis in this 
experiment showed that tests constructed 
to measure certain behavioral outcomes 
apparently perform separate functions as 
evaluation devices. Thus, to insure that 
multiple course outcomes, in line with the 
objectives of instruction, are achieved, it 
becomes necessary to design evaluation 
instruments to accomplish these separate 
functions. These results tend to agree with 
the results of previous studies done by 
Tyler (13), McConnell (9), Johnson (7), 
Brown (3), Horrocks (6), and Bedell (1), 
all of which make it apparent that the 
achievement of one objective cannot be 
inferred from the achievement of another. 
Remmers has expressed this point when 
he concluded (11, p. 31): i ya the edu- 
cator must clearly define each objective 
in terms of the measure of its attainment. 
The attainment of a particular objective 
cannot be inferred from measured attaim- 
ment of another objective.” 

The majority of studies reported in the 
literature suggest much of what is learned 


in school is forgotten. It has long been the 
concern of educators to provide learning 
experiences of more permanent value. 
From this standpoint, the results of this 
investigation suggest that increased em- 
phasis on some of the higher levels of 
understanding such as interpretation and 
extrapolation will lead to more economical 
learning. As the authors have defined the 
objectives in the “Taxonomy,” each higher 
level of intellectual ability is built on and 
includes the previous levels. To emphasize 
such abilities as interpretation and ex- 
trapolation means that the possession of 
knowledge and the ability to translate 
it will be a part of the learning experience, 
but that understanding will go beyond 
these lower levels of intellectual endeavor 
and involve mastering more permanent 
abilities and skills. Such practices have 
not always been the case. Tyler (13) 
found that interviews with college stu- 
dents indicated that more than 60% of 
the students in college believe their chief 
duty is to memorize information. Tyler 
stated that the emphasis given to recall 
of fact in the typical college examination 
is one of the chief reasons for the exis- 
tence of this belief. 

It has been previously shown in studies 
done by Tyler (12), Wert (14), and 
Frutchey (5) that such outcomes as the 
ability to apply principles to new situa- 
tions and interpret new experiments dem- 
onstrated much higher degrees of per- 
manency than abilities involving only the 
recall of specifics. The results of the pres- 
ent experiment agree in general with what 
has been previously done. It also suggests 
that such a device as the “Taxonomy” 
will enable us to do a far more systematic 
and communicable job in studying dif- 
ferent outcomes of instruction. 


SUMMARY 


The purpose of this experiment was to 
study the differential retention of certain 
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course outcomes in a beginning educational 
psychology course. 

Tests were constructed to measure four 
different behavioral outcomes in the con- 
tent area of tests and measurements. These 
outcomes were: (a) knowledge, (b) trans- 
lation, (c) interpretation, and (d) ex- 
trapolation. As a result of a homogeneity 
analysis of the behaviors measured by 
these different tests, it was found de- 
sirable to combine the interpretation and 
extrapolation tests in that they seem to 
be performing a similar measurement func- 
tion. 

The tests were administered as a pre- 
test before the units on tests and measure- 
ments were studied, at the completion of 
the units, and a third time after approxi- 
mately four months had elapsed. The re- 
sults of the study of retention indicated 
that the abilities to interpret and extrapo- 
late were retained to a significantly greater 
degree than the ability to recall knowledge 
or translate this knowledge from one form 
to another. It was concluded that there 
was differential retention among the be- 
havioral objectives measured. 
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Educators have become increasingly 
concerned with the relationship between 
social status and educational outcomes. 
Numerous studies of this relationship have 
been reported. The findings have demon- 
strated that social status is related to 
practically all educational experiences. 

The relationship between social status 
and intelligence and achievement has been 
emphasized in these studies (7, 14). Other 
studies have involved personality (2, 6), 
extracurricular activities (9, 15), social 
acceptance (1, 11), honors received (1), 
attitudes (8, 10), and morale (3). Re- 
sults usually indicate that upper status 
pupils exceed lower status pupils on 
achievement and intelligence test scores, 
marks in school, adequacy of adjustment, 
number of activities, social relationships, 
and attitudes toward school. The differ- 


ences are generally statistically significant. 


In the present study, the relationship 
des toward school 


between specific attitu 

and level of income was investigated. The 
purpose was to ascertain on which of a 
number of attitudinal items pupils varied 
in their responses when they were divided 


into three income groups. 


PROCEDURE 


containing & morale 


scale and a “house and home” scale, was 
administered to approximately 3,000 pu- 
pils in nine central and south central 
Indiana high schools. The morale scale, 
constructed as part of another study (3), 
contained 27 attitudinal items. The items 
pertained to the school, teachers, school 
program, other pupils, and the value of 
education. Each item in the scale was 
stated as a question, and was followed by 


A questionnaire, 


a list of five possible responses. The re- 
sponses reflected (a) a very favorable at- 
titude, (b) a favorable attitude, (c) a 
neutral (neither favorable nor unfavor- 
able) attitude, (d) an unfavorable atti- 
tude, and (e) a very unfavorable attitude. 
Following is an example of a typical item 
and list of responses. 

Item: What is your general opinion of 
the other boys and girls in your high 
school? 

——a. They are the best group of boys 
and girls in the world! 

——b. I feel that we have a good group 
of boys and girls in our high school. 

——e. Some of the other students are 
all right; some are not. 

—d. I feel that this high school has 
a poor group of boys and girls. 

_—e. They are the worst group of 
boys and girls in the world! 

Pupils were instructed to check the re- 
sponses with which they agreed most 
closely. 

‘An indication of income level was ob- 
tained from a “house and home” scale. 
This scale listed seven things either found 
in the home or provided for the pupil. 
The items were a vacuum cleaner; an 
electric or gas refrigerator; & bath tub or 
shower with running water; two automo- 
biles (excluding trucks) ; lessons in drama, 
art, expression, dancing, or musie pro- 
vided outside of school; an automatic 
dishwasher; and a cabin or cottage for 
vacations. Pupils checked the items which 
applied to them. 

The “house and home” scale has been 
used extensively by Remmers and others 
(12) in the Purdue Opinion Panel studies 
to divide pupils into income groups (eg, 
10). Elias (5) and Remmers and Kirk 
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TABLE 1 


NUMBER AND PERCENTAGE or Purits Wo CHECKED Irems on House anp Home SCALE, 
AND NUMBER AND PERCENTAGE OF PUPILS IN Eacan Income Group 


Number and percentage of pupils checking 


ber of 
eae checked N % N % Income Group 

0 11 1.2 
$ 63 7.2 219 24.9 Low 
2 145 16.5 
3 270 30.8 ; 
4 288 32.8 558 63.6 Middle 
5 83 9.5 ; 
6 9 1.0 101 11.5 High 
7 9 1.0 

Total 878 100.0 878 100.0 


(13) have reported on the validity of the 
scale. 

A sample of 878 cases was selected from 
the returns. The sample included 100 
questionnaires, selected randomly, from 
the six larger schools, and all useable 
questionnaires from the three smaller 
schools. 

Based on the number of items checked 
on the “house and home” scale, Ss were 
divided into three income groups. The 
high income group included Ss who 
checked five to seven items. The middle 
income group included those who checked 
three or four items. And the low income 
group included those who checked two 
or fewer items. The number and per- 
centage of pupils who checked each num- 
ber category, and the number and per- 
centage of pupils in each income group are 
shown in Table 1. 

Responses to each attitudinal item were 
tabulated by income group. The responses 
were then combined into two categories. 
One group included favorable and very 
favorable responses. The second group in- 
cluded all other responses. 

For each item, the following null hy- 
pothesis was postulated: There is no dif- 
ference in the responses of pupils of varied 
income groups. Each of the 27 hypotheses 
was tested by the chi-square technique. 
The tests were based on a series of 2x 


3 contingency tables. The combination of 
Tesponses provided a uniform series of 
tables, with a minimum expected frequency 
of five in each cell. 


Resuurs AND Discussion 


The results of the chi-square tests are 
given in Table 2. The table also shows 
the percentage of pupils, by income group, 
who checked favorable and very favorable 
responses. The  item-questions were 
abridged to conserve space in the table. The 
column headed “P” indicates the probabil- 
ity level associated with chi-square values, 

The items were divided into seven 
groups to facilitate interpretation: (A) 
Attitudes Toward Teachers, (B) Attitudes 
Toward the School, (C) Attitudes Toward 
School Program, (D) Attitudes Toward 
Appropriateness of School Work, (E) 
Attitudes Related to Future Expectations, 
(F) Attitudes Related to Social Accep- 
tance, and (G) Miscellaneous Attitudes, 
The letters are used in designating the 
items in Table 2. 

The data show that responses varied 
significantly on relatively few items, Only 
eight of the 27 hypotheses could be Te- 
jected, six at the 1% level and two at 
the 5% level. The Tesponses among groups 
differed widely, ranging from practically 
no variation to extremely significant varia- 
tions. The frequencies generally varied 
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A TABLE 2 
ResuLrTs or TEsTs OF SIGNIFICANCE SHOWING PERCENTAGE OF Purits CHECKING VERY 
FAVORABLE AND FAVORABLE RESPONSES, BY Income GROUPS 
Income Group 
Item 5 Total P 
A High ma Low 
A-1 | What is your opinion of your high school 
a E e a a § 69.3 | 61.6 | 65.8 | 63.6 | -30 
A-2 | Do your teachers treat you fairly?...-.---:-> 89.1 | 84.8 | 83.1 | 84.9 | .50 
A-3 | Are your teachers personally interested in 
OP anana ge ts ng OE He dpatereerta iie = 64.4 | 55.4 | 49.3 | 54.9 | .05 
A-4 | Do your teachers “know” and understand 
'i their pubjeclaPiecysonosus a nag N En 82.2 | 83.7 | 83.6 | 83.5 | -95 
A A-5 | How well are your subjects taught?. ....----- 54.5 | 52.5 | 50.7 | 52.3 | -90 
d A-6 | Do your teachers help you sufficiently with 
your school GOED VE EL a * 84.2 | 78.9 | 74.0 78.2 | .20 
A-7 | Would you ask adults in your school for help 


with personal problems?.. -- _| 37.6 | 31.2 | 35.2 | 32.9 50 


B-1 | What is your general opinion of your high 
AGHOGI?, vw is aana an a a snes A 82.2 | 74.0 | 65.8 72.9 .01 
2 | How well is your school organized?...---.--- 67.3 | 69.2 | 68.9 | 68.9 .95 
B-3 | How satisfactory are the working and study- 
ing conditions?. -» «seee resar ennet i Tae 38.6 | 40.7 | 38.8 | 40.0 .90 
B-4 | How satisfactory are the equipment and fa- 
34.7 | 29.4 | 26.5 | 29.3 .50 


Ae Rasvncs se seas a dae 3 
B-5 | How satisfactory is the grading system?. ...-- 80.2 | 79.6 | 81.7 20:2. | .80 

6 | What is your opinion of the school spirit in 
53.5 | 51.8 | 49.8 | 51.4 90 


your E .acsnaiannae EE Name 
-1 | What is your opinion of the group of subjects 
your school Offera?)... -jasisinanss sees wate a 65.3 | 62.9 | 54.8 61.2 | .10 
C-2 | What is your opinion of the number of activi- 
ties in your school?....--.+-0- e y y aT _,.| 48.5 | 52.5 | 50.2 51.5 | .70 
D-1 | Is your school work the kind of work you like 
ABB a. anime de easy ea ee 71.3 | 59.5 | 61.6 | 61.4 | .10 
D-2 | Is your school work interesting?..----+--7*** 86.1 | 75.1 | 73.5 76.0 | .05 
E-1 | Will your school work be useful after you 
leave school?... ee 94.1 | 91.2 | 88.1 90.8 20 
ou get more 


E-2 | Will going to high school help y 

< satisfaction from Tivib pence ce TE 94.1 | 90.1 | 85.8 | 89.5 10 

a E-3 | What are your chances of getting the job you 

want after high poliool?.... 2400 see ten te” 70.3 | 59.9 | 45.2 | 57.4 | -001 

F-1 | Are you satisfied with your social life in high 
Jlifeia H8} | 74.3 | 71.0 | 60.3 | 68.7 | -01 


“| 85.1 | 80.5 | 67.1 | 77-7 | -001 


SCHOOL? E EEE aS 
i Do the other students nkr you? an ge z 
P-3 er people in your schoo! trea 
Beep do atter pple mvs (gg [soar | oz | sa | 0 
T-4 | What is your opinion of the other boys and 
girls in your achiote c+ eaa 
G-1 | Are your parents interested in your high 
school work? . et Bs a ao 
G-2 | How do people in your community feel about 
your high a ES t E es oat ee 73.3 | 73.5 | 66.2 | 71.6 | .20 

" G-3 | How hard are you working or studying 1m 
y a hodie ai eera ne 46.5 | 48.9 | 46.1 | 48.0 | .80 


„| 71.3 | 65.2 | 49.8 | 62.1 | .001 
96.0 | 92.1 | 76.7 | 88.7 | .001 
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greatest on items which involved inter- 
personal relationships. And they appeared 
to vary least on items which the pupils 
could consider objectively, with limited 
emotional attachment. 

Significant variations were noted for 
three of the four items in the social 
acceptance group (Group F). Low in- 
come pupils reacted less favorably than 
other pupils to their social life (Item 
F-1), to being liked by other pupils (F-2), 
and to other pupils (F-4). According to 
unpublished data (4), high and middle 
income pupils significantly exceeded others 
in the percentage who associate with fel- 
low pupils outside of school. Low income 
pupils were more likely to associate with 
youth from other schools or not in school. 

The attitudes on social acceptance 
seemed to be related to other attitudes on 
which differences in responses were ob- 
served. Low income pupils apparently are 
not as sure of parental interest in school 
work as other pupils (G-1). Whereas prac- 
tically all high income pupils indicated 
that they felt that their parents were in- 
terested in their work, only three fourths 
of the low income pupils expressed similar 
opinions. These differences were highly sig- 
nificant, with P < 001. 

Low income pupils also differed from 
other pupils in their estimates of the per- 
sonal interest of their teachers (A-3). The 
differences were significant at the 5% level. 
This item was the only one of the seven 
items pertaining to teachers on which re- 
sponses varied significantly. 

The item on general impression of the 
high school (B-1) was the only item re- 
lated to teachers, school, school program, 
appropriateness of work, and value of edu- 
cation on which differences were significant 
at the 1% level. In view of the homoge- 
neity of other items, it would seem that 
responses to this item were affected more 
by the nature of relationships than by the 
nature of the school and school program. 


Responses varied widely to the item on 
future employment (E-3). Over two thirds 
of the high income pupils, as compared 
with less than one half of the low income 
group, expressed favorable responses about 
getting the kind of job they want. Differ- 
ences were significant at the 0.1% level. 
This item may be related to post high 
school educational aspirations. It was 
found that one half of the high income 
pupils in the sample plan to go to college, 
as compared with less than one sixth of 
the low income group (4). 

Except for the two items mentioned pre- 
viously, responses to items on teachers 
(Group A) and school (Group B) varied 
slightly or not at all. Pupils in the three 
income groups were virtually in complete 
agreement on items related to the tech- 
nical operation of the school and the tech- 
nical competency of teachers, 

The responses to items on school pro- 
gram (Group C), appropriateness of 
school work (Group D), and the value of 
education (E-1 and E-2) generally varied 
more than the responses to items on 
teachers and the school. The responses of 
high income pupils were more favorable 
for five of the six items in these groups, 
but, except for item D-2, variations were 
not significant. High income pupils were 
more interested in their school work than 
others (D-2), and differences were signifi- 
cant at the 5% level. The Ss responded 
uniformly to the number of activities in 
the school (C-2), even though low income 
pupils participated in significantly fewer 
activities (4). The pupils in all groups re- 
acted favorably to the value of education. 
The low income pupils, however, reacted 
more favorably to the utility value of edu- 
cation (E-1) than to the enrichment value 
(E-2). 

The low income pupils differed from 
others—but not significantly—on estimates 
of how people in their communities felt 
about their high schools (G-2). And on the 


a 


ATTITUDES TOWARD SCHOOL 


> question of how hard pupils were working 


in high school (G-3), no variation among 
groups was observed. 


CONCLUSIONS 


The data seem to support the following 
conclusions: 

1. Responses of pupils of different in- 
come levels were more likely to vary on 
items related to interpersonal relation- 
ships than on items which involved an ob- 
jective appraisal of the school or the 
school program. 

2. The schools in the study have pro- 
vided an educational program uniformly 
accepted by pupils of the three income 
levels. They have been less successful in 
integrating all pupils into the social struc- 
ture of the school. How acceptance may be 
gained for all pupils is undoubtedly a 
perennial problem. 

3. The low income pupil is less likely to 
enjoy strong parental interest and support 
than other pupils. An immediate, prac- 
tical problem confronting the schools, 
therefore, is stimulating interest of all 
parents in school and school work. 

4. Variations in estimates of possible 
satisfactory future employment among 
pupils of varied income Jevels suggest that 
more attention should be given to helping 
noncollege, low income pupils select, pre- 
pare for, and enter an appropriate voca- 
tion. 


SUMMARY 


When 878 pupils from nine Indiana 
high schools were divided into three in- 
come groups, it was found that they Te- 
sponded similarly to attitudinal items oD 
school, school personnel, school program, 
and the value of an education. The re- 
sponses varied significantly with income 
level, however, on items related to inter- 
personal relationships. The items on which 
differences were observed pertained to 50- 


cial life, being liked by other pupils, opin- 
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ions of other pupils, feelings of parental 
interest in school work, and personal 
interest of teachers. Although pupils re- 
sponded uniformly on specific items per- 
taining to the school, they varied signifi- 
cantly, according to income level, in their 
general impression of their schools. They 
also varied significantly in their estimates 
of being able to get the kind of jobs they 
want after they leave school. 
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VALIDATION OF NEW ITEM TYPES AGAINST 
FOUR-YEAR ACADEMIC CRITERIA 


JOHN W. FRENCH 
Educational Testing Service 


The College Entrance Examination 
Board undertook in 1951 a study to ex- 
plore the effectiveness of a series of new 
aptitude tests that might prove to be con- 
tributive supplements to the Scholastic 
Aptitude Test or effective substitutes for 
parts of it. 

Validity studies of the SAT itself are 
carried out routinely. Substantially all of 
these use as criteria the grades received 
during freshman year. This has been done 
mainly because of the great delay encoun- 
tered in waiting for the longer-term cri- 
teria. Furthermore, while students take a 
considerable variety of courses in fresh- 
man year, their freshman programs are 
much more alike than their upperclass 
this reason “average fresh- 


man grades” may be not only more 


` quickly available but also more meaning- 


ful than average grades received when the 
students are working in different subject- 
matter areas having different degrees of 
difficulty. 

It is useful to consider some hypotheses 
for the change that might occur in the 
validity of an aptitude test between fresh- 
man and upperclass years in college. The 
following conditions should lead to a de- 
crease in the validity of aptitude tests: 


1. Seniors take more varied courses than 
freshmen; success. in the various courses, 
some easy, and some difficult, will be hard 
to predict. 

2. Time between testing and the measure- 
ment of the criterion allows more scope 
for changes to take place in the individual 
students as a result of different experiences 
or different rates of maturation. 

3. Attrition at college cuts down the 
range of ability between freshman and 
Senior years. 


Conditions possibly leading to an increase 
in validities are as follows: 
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1. A lack of adequate adjustment to col- 
lege life in freshman year might introduce 
extraneous influences on scholastic success. 

2. More uniformly high motivation and 
a more serious attitude toward work in up- 
perclass years may cut down one source of 


extraneous variance. 

3. Emphasis on memory work in fresh- 
man year may depend on motivation or 
other factors, while the understanding and 
problem solving required in upperclass 
years may depend more upon the aptitudes 
measured by most test scores. 


Tt is not easy to guess at the resultant of 
such factors as these. 

There have been very few studies where 
validities of High School Record and of 
College Board and other tests for fresh- 
man grades have been compared with 
validities for four-year grades. Studies by 
Dwyer (2), Brush (1), and Frederiksen 
(3) have shown, in general, that four- 
year cumulative average validities do not 
differ consistently from freshman validi- 
ties. Findings in the present study gen- 
erally confirm this conclusion. 

Even less has been done in validating 
the SAT against major field grades. An 
unpublished study carried out at Stan- 
ford University (6) shows validities of 
the SAT and high school record for cumu- 
lative average and major-field grades. The 
superiority of the high school record in 
that study and the sex differences found 
for the validity of the SAT for Social 
Science grades are not confirmed by find- 
ings in the present study. 


THE EXPERIMENTAL TESTS 


In addition to the High School Record, 
SAT-V (verbal), SAT-M (mathematical), 
and the CEEB English Composition Test. 
the measures investigated in this AN 
consisted of 11 newly adapted or newly 
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developed aptitude tests, some with part 
scores. Descriptions of the tests and re- 
liabilities by the Kuder-Richardsor for- 
mula No. 20 are as follows: 


1. Social Studies Reading. A 1,000-word 
passage by Hamilton concerning the Bill of 
Rights with questions on interpretation, vo- 
cabulary in context, and the structure of 
the passage. 25 four-choice items, 25 min- 
utes. Reliability, .66. 

2. Science Reading. A 1,200-word essay 
on “A Piece of Chalk” by Huxley with 
questions on interpretation, vocabulary in 
context, and the structure of the passage. 
25 four-choice items, 25 minutes. Relia- 
bility, .71. F g 

3. Inductive Reasoning. A spiral omni- 
bus test using items drawn from verbal, 
nonverbal, arithmetic, science, and social 
studies materials. The items were of three 
types found to measure inductive reasoning: 
analogies, series, and categories or “belong- 
ing” items. The great variety of item types 
was introduced so that the subjects could 
not develop a uniform approach or uniform 
method of solution, which would render the 
test deductive instead of inductive. 65 items, 
25 minutes. Reliability, 82. 

4. Integration. This test was similar to 
conventional “artificial language” tests ex- 
cept that the rules for translation were more 
complex, and there was no premium on 
quick memory. It was developed as a test 
of one of the factors called “integration” 
in the Army Air Force Aviation Psychology 
Program (4). This is the ability to under- 
stand and follow complex directions. 15 
items, 25 minutes. Reliability, 73. 

5. Sufficiency of Data. Each problem 
consisted of a question followed by two 
mathematical or quantitative facts. The 
task was to decide whether either fact, both 
together, both separately, or none were suf- 
ficient to answer the question. 30 problems, 
25 minutes. Reliability, 80. 

6. Data Interpretation. This test con- 
sisted of statements related to the content 
of two sets of data: a table on the expendi- 
tures of state governments and a verbal 
exposition of a research concerning enlarge- 
ment of the thyroid gland. The task was to 
decide whether the data were sufficient to 
make each statement true, probably true, 
false, probably false, or none of these. 30 
items, 25 minutes. Reliability, .68. 

7. Visualization. Drawings indicated how 
a square sheet of paper was folded and then 


punched one or two times. The task was to 
select from five drawings the one that 
showed how the paper would look when 
opened. 20 items, 25 minutes. Reliability, 
85. 

8. Best Arguments. Situations involving 
some sort of dispute were described in a 
paragraph, Subjects select one or two state- 
ments constituting the best argument for 
each side. Four situations totalling 21 items, 
25 minutes. Reliability, very low. (K.R. 20 
was not applicable, because the items were 
not independent from each other.) 

9. Perceptual Speed and Carefulness. The 
two parts of this test each contributed to 
the measurement of Perceptual Speed and 
Carefulness. (a) Cancellation. A page of 
random capital letters typed close, lines 
single spaced, and reproduced in red. The 
task was to draw an X over every A. Three 
minutes were allowed. (b) Picture Discrimi- 
nation. Each item consisted of three simple 
drawings of a face, two exactly alike, and 
one different in some respect. Three minutes 
were allowed. The score for Perceptual 
Speed was the number of A’s cancelled plus 
the number of faces correctly marked. Re- 
liability, 94. The Carefulness score was the 
inverse of a score developed by adding 
omissions on Cancellation to five times the 
wrongs on Picture Discrimination. (This 
Scoring formula operated to weight the two 
parts equally in the total score.) Reliability, 


10. Memory. This test had 3 parts scored 
as separate variables for validation pur- 
poses: (a) Picture Memory. A picture of a 
Venetian palace was studied for five min- 
utes. Later a second Picture was presented 
showing the same palace with some features 
changed. The students were allowed five 
minutes to answer 30 true-false questions 
comparing the pictures, (b) Verbal Memory. 
A one-page description of 
Honduras was studied for 
Later the st 


two-hour testing session, and the response 
portions took place during the last 15 min- 
utes with one and a half hours of other testing 
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coming in between. Reliabilities for the 
three parts were respectively .74, .54, and 
69. 

11. General Information. This test pre- 
sented five-choice factual information items 
drawn from various fields with the intention 
of measuring interest in those fields. The 
items were selected so as to avoid informa- 
tion that would be acquired in school, but 
to include information that would be gained 
through hobby work or incidental reading, 
presumably of the student’s own choosing. 
The scores (number of items answered cor- 
rectly out of the 15 for each of seven fields) 
were treated as separate variables in the 
validation study. The fields included were: 
(a) Art and Architecture, (b) Literature, (c) 
Social Work, (d) Government, (e) Biological 
Science, (f) Physical Science, and (g) Me- 
chanical. Total items 105; total time 40 
minutes. Reliabilities for the parts were re- 
spectively .53, .52, 30, .52, 52, .56, and .52. 


ApMINISTRATION OF THE TESTS 

Ten liberal arts colleges, all of which 
require the SAT for entrance, participated 
in the study by scheduling two hours of 
testing for their entering freshmen and 
by supplying all course grades and some- 
times the high school record. Four tests 
were administered at each of the colleges 
by combinations taken so as to provide a 
substantial number of cases for the most 
interesting of the intercorrelations. Ex- 
cept for Perceptual Speed and General 
Information, the tests were 25 minutes in 
length, a half hour including administra- 
tion time. Pereeptual Speed and General 
Information were always given together 
as they formed a one-hour unit. Other- 
wise, the order of administration of the 
tests was varied from college to college. 
All administrations took place in the fall 
of 1951. 

Tuer CRITERIA 

The data used were found on the tran- 
scripts of the students’ college records or 
on supplementary material provided by 
the colleges. Descriptions of the criteria 
follow. 

Cumulative college average. This was 
the over-all college grade-point average. 


Many different marking systems were 
represented, but it was not necessary to 
convert all of these to a common scale, 
because separate correlation studies were 
undertaken for each college. The Cumu- 
lative Average was computed for all stu- 
dents who had completed at least a half 
year of work. Inclusion of students who 
did not finish college introduces an im- 
purity into this criterion, because the 
grades are not earned in all years of col- 
lege on the same basis. For example, it is 
somewhat easier to earn high grades in 
senior year than it is in freshman year. 
However, to have failed to include non- 
graduates in the cumulative average would 
have sharply reduced the number of cases 
in the study and might have eliminated 
the part of the range of test scores and 
grades that is of most interest to admis- 
sions officers. 

Freshman grades. The freshman grade 
average was computed in the same way as 
the cumulative average. Freshman grade 
averages in specific course areas were also 
computed. 

Major-field grades. The major-field 
grade was computed from the grades in 
the major-field courses taken at the par- 
ticipating college during junior and senior 
years. This criterion was computed for 
graduating students only. To simplify the 
tables given in this article and to increase 
the stability of the figures, the major 
fields were grouped into three groups: 
science and mathematics, social science, 
and humanities and languages. In all cases 
validity coefficients were computed sepa- 
rately for the individual major fields and 
were averaged by using z transformations 
and weighting by number of cases." 


Graduation-nongraduation. This was 


1 For graduating students, comprehensive 
examination grades in the major field were 
available for 6 out of 10 of the participat- 
ing colleges. However, the validity pat- 
terns were found to be so much like those 
for the major fields that data on this cri- 
terion have been omitted from this article. 
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the simple dichotomy, graduation vs. non- 
graduation. 
a 
RELIABILITY OF THE CRITERIA 


For some of the colleges the grade 
average, or the major-field grades, or both, 
were computed separately by college year. 
This was done to provide a spot check 
on the estimated alternate-form reliabil- 
ity of these averages. To the extent that 
motivational or other factors change con- 
ditions during the course of the four 
years, the correlations are lowered. There- 
fore, the interyear correlations represent 
underestimations of the alternate-form 
reliability. All of the interyear correla- 
tions are based on graduating students, 
because relatively few of the nongraduates 
had a college record beyond freshman 
year. 

The average of the interyear correla- 
tion figures for average grades (computed 
with z transformations) was .71. Since the 
separate single years can be considered to 
be alternate quarters of the criterion, it 
is appropriate to apply the Spearman- 
Brown formula to estimate the reliability 
of the four-year cumulative average. The 
corrected figure would be about .91. Fur- 
thermore, since the interyear correlations 
were computed for graduating students 
only, it is also reasonable to consider a 
correction for restriction of Tange in or- 
der to estimate the reliability of the cu- 
mulative average, which is used in this 
report for all students whether or not 
they graduated. The standard deviation 
of the cumulative average for graduating 
students was found to be on the average 
about 25% less than that of the cumula- 
tive average for all students. The correc- 
tion for restriction of range would raise 
the reliability figure still farther. No at- 
tempt will be made to compute the exact 
correction, because corrections from .71 
up to that level are subject to considerable 
distortion. It is clear, however, that the 
reliability of the cumulative average 


compares favorably with that of long, 
well-made aptitude tests. At the same 
time, it is well to remember that the “re- 
liability” in some colleges might be partly 
a result of “halo.” In addition, there are 
other aspects of grades such as prompt- 
hess or neatness which may give them 
high consistency without necessarily re- 
flecting consistent evaluation of achieve- 
ment that is considered important. 

For major-field grades, correlations be- 
tween junior-year and senior-year grades 
were used. These were found to be .65, 
.75, and .71 for the three areas respec- 
tively. No correction for restriction of 
range is applicable. However, since the 
junior and senior years may be considered 
to be alternate halves of the major-field 
grade criterion, the Spearman-Brown for- 
mula may be used in arriving at estima- 
tions for the reliabilities of the two-year 
major-field average. The corrected figures 
were .79 for science and mathematics, .86 
for social science, and .83 for humanities 
and languages. Since the best validities 
reported here or elsewhere do not ap- 
proach the limit made possible by these 
reliabilities, the theoretical best possible 


prediction of college grades is still far 
away. 


Tue Finpincs 


Findings With Regard to Average Grades 
and Graduation 


For the cumulative average, Table 1 
summarizes for the ten colleges the validi- 
ties of the SAT, High School Record, 
CEEB English Composition Test, and 
the experimental tests. To a considerable 
degree the validities are comparable from 
college to college. The only real exception 
to this is the large size of the validities of 
the experimental tests at College J. Com- 
parisons among the validities of the tests 
are commented upon in a later paragraph 
when results from the several colleges are 
pooled. 
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TABLE 1 
SUMMARY TABLE oF VALIDITIES FOR CUMULATIVE AVERAGE 
College 
Variables A As B c E F G H I J 
N=|N= =|N=|N= |W=| N= |N=|N= = 
449 172 |154| 870 222 |246| 118 |481| 579 | 190 
SAT-V 45 .56 | .32 44 41 | .41 -61 | -39 41 | .43 
SAT-M -32 .21 | -18 31 22 | .27 .34 | .22 26 | .27 
High School Record -39 AT 62 .63 | .42 
English Composi- -40 30 
tion Test 
Social Studies .38 .35 42 
Reading 
Science Reading -28 .36 | -25 
Inductive Reason- .23 21 .49 
ing 
Integration -27 -24 p 
Sufficiency of Data .32 -28 d 
Data Interpreta- -24 -37 | -28 -30 
tion al ao 
Visualization . : 
Best Arguments 14 .13 .30 | .12 ; 1 
Perceptual Speed pad 15 18 +18 or 
Carefulness 05 -00 9 | 200 = 
Picture Memory E a 16: E 22 
Verbal Memory -23 A R TO 
Number Memory AE i 5 | < z 
Art Information 37| 2% 28| .13 a 
Literature Infor- -36 .40 37 -21 
mation 
Social Work Infor- .30 .18 26 -28 23 
mation 
Government Infor- | -40| -39 4| .24 35 
mation 
Biology Informa- 27 17 22 4 09 
tion 
Physical Science -23 21 21 -10 20 
Information 
Mechanical Infor- | — -08 | —-11 02 | —.16 o 
mation 


Table 2 shows the results pooled for all 
colleges; that is, averages across colleges 
have been computed. This makes conven- 
ient the comparisons among test validities 
for nine criteria: freshman average, CU- 
mulative average, graduation-nongradua- 
tion, and freshman and major-field aver- 
age in each of the three areas. 

Some of the experimental tests, par- 
ticularly when their short length is con- 
sidered, have substantial validities for 


cumulative average. A discussion of com- 
parisons among the test validities, how- 
ever, will be more appropriate in the next 
section where statistical corrections for 
restriction of range and for test length 
are applied. 

Since freshman grades constitute part 
of the cumulative average, some similarity 
of validity coefficients for these two cri- 
teria is to be expected. However, the ex- 
treme closeness of the figures in the first 
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TABLE 2 
VALIDITIES FOR ÅCADEMIC RITERIA AVERAGED OVER COLLEGES 
i. Science & Social Humanities 
iabl Fresh. | Cum. | Grad.- Math. Science & Lang. 
Variables avg. avg. |nongrad. 
Fresh. |Major | Fresh. Major|Fresh. |Major 
SAT-V 44 -43 -09 -36 -35 -43 43 -39 34 
SAT-M -29 -27 -06 37 -34 20 +26 18 23 
High School Record 46 46 -18 -41 27 -34 -29 37 29 
Social Studies Reading -37 -38 -07 -35 .29 -40 36 31 +29 
Science Reading +29 +29 -04 -23 | .21 -80 | .31 AF .29 
Inductive Reasoning -36 31 -08 +88] .19] .24] .10] .39] .20 
Integration -32 -28 .09 -30| .28| .16| .17| .98 22 
Sufficiency of Data .34 -34 -09 42) .42) .25| .13| .21| .16 
Data Interpretation +25 +29 -12 -27 | .18 | .23 s A 21 
Visualization Ari sM .01 -26 | .19| .08| .04] 106] 02 
Best Arguments -14 -15 .04 Ai -08 -16 12 12 -16 
Perceptual Speed ole 15 03 -416| .06| .07] .12] 11| .96 
Carefulness —.04 | —.05 | —.02 |—.02 |—.04 |— 03 -O1 -00 |—.08 
Picture Memory -12 -12 — 01 -13 -08 -14 -12 -05 .12 
Verbal Memory +21 21 -09 17 -07 +23 +24 -18 09 
Number Memory 06 -08 -03 i |] eit .04 | .01 .01 ea 
Art Information +25 +25 -03 «18 | .28) .27| .23] 7%] 37 
Literature Information | .35 .32 -09 -24) .26) L) .81| .27| .97 
Social Work Informa- | 24 26 -06 +18} .24] .22] .24] 116] .98 
tion 
Government Informa- | .37 -37 11 -29 | .34| .32| .33| .26| .24 
tion 
Biology Information +22 -19 -03 -23 .20 | .19 15] .10 15 
Physical Science -20 .20 .05 -24| .27/ .15| 12| 32) :18 
Information 
Mechanical —.02 | —.04 | —.02 -03 |—.02 |—.06 |—.03 |—.03 |—.04 
Information 


two columns of Table 2 shows that the 
tests that are valid for freshman grades 
are valid to much the same degree for 
upperclass grades. There is a very slight 
tendency for the cumulative average va- 
lidities to be lower than the freshman 
validities, but the change is so slight as 
to be of no practical importance. The 
lack of substantial change in the size of 
the validity coefficients suggests that the 
factors favoring downward or upward 
changes listed earlier in this article either 
are not operative or approximately bal- 
ance each other. These findings also sup- 
port the viewpoint that for use in validity 
studies the freshman grade average is a 
satisfactory substitute for the four-year 
cumulative average. 


The average validities for graduation- 
nongraduation are also given in Table 3. 
Apparently none of the tests in this study 
have an appreciable relationship to grad- 
uation-nongraduation, and high school 
record has very little. Before attempting 
to interpret this finding, it will help to 
look at the relationship of this criterion 
to grades. 

The correlations between graduation- 
nongraduation and grades were found for 
the 10 colleges to range from 20 to .53. 
The weighted average is 44. However, 
these correlations are partly accounted 
for by an artifact of the situation. A check 
of the available data confirms what is, 
perhaps, a well known fact that, for those 
students who reach senior year, grade 
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TABLE 3 
UNCORRECTED AND CORRECTED VALIDITIES FOR CUMULATIVE AVERAGE 
College Ai College C College I 
f (v = 449) (= 870) (W = 579) 
Variables 
Un- Un- Un- 
corrected Corrected | corrected Corrected arda Corrected 

SAT-V (90 min.) 45 5+ 44 54 -41 .58 
SAT-M (60 min.) .32 Al -31 -40 -26 45 
SAT-V (25 min.) 45 -51 44 51 Al .55 
SAT-M (25 min.) .32 .39 .31 -38 26 42 
Science Reading +28 .37 
Soc. Stud. Reading -38 AT 
Data Interpretation -30 -46 
Sufficiency of Data .32 42 
Integration 24 RY} 
Best Arguments 15 -28 
Literature Info. -36 -53 87 57 25 51 
Government Info. -40 -60 41 62 35 62 


averages are higher in senior year than 
in freshman year. By assuming that the 
students do not work any harder in their 
senior year, it can be argued, then, that 
the grading system changes; high grades 
are easier to get in senior year. Since 
most of the nongraduating students only 
received grades early in their college ca- 
reers, when good grades were most diffi- 
cult to earn, the correlation between grade 
average and graduation-nongraduation 
would almost certainly be higher than the 
actual relationship between graduation- 
nongraduation and scholastic success. One 
measure of this actual relationship is the 
correlation between freshman grades and 
graduation-nongraduation. In this study 
the correlation between graduation-non- 
graduation and cumulative average was 
found to be .46 at College B and .30 at 
College J, while the same figures for fresh- 
man average were only .25 and 15, re- 
spectively. These lower figures may be too 
low because of the elapse of time between 
freshman year and the time when many 
students withdraw, but they probably give 
a truer picture of the correlation between 
Scholastic success and graduation-non- 
graduation than do the figures for cumu- 


lative average. This correlation is low, 
because so many things other than grades 
can cause a student to withdraw from 
college. 

The implication of the still lower rela- 
tionship between test scores and gradua- 
tion-nongraduation seems to be that none 
of the colleges participating in this study 
admitted many students whose aptitude 
as measured by tests was so inadequate 
as to lead to either voluntary withdrawal 
or dismissal. This was true even for col- 
leges E, G, and J, where the SAT statis- 
tics indicate that little or no selection oc- 
curred. On the other hand, to the small 
extent that grades do correlate with grad- 
uation-nongraduation, it may be said that 
withdrawal or dismissal occurs when stu- 
dents underachieve, that is, get grades 
which are lower than would be expected 
from their test scores. It is possible, of 
course, that the desire to leave college 
comes first for nonscholastic reasons and 
is followed by a drop in grades. In any 
case, the clear finding is that graduation- 
nongraduation does not serve as a pre- 
dictable criterion against which it is pos- 
sible to validate the kinds of tests tried 
out in this study. 
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Application of Statistical Corrections 


In order to make appropriate compari- 
sons between the validities of the ‘tests, 
it is necessary to apply corrections for 
restriction in range on the SAT and for 
variations in the lengths of the tests. Un- 
fortunately, it is often misleading to make 
statistical corrections, but it can be even 
more misleading to do without them. For 
this reason, figures for the reliability of 
the criteria have already been given both 
with and without corrections. Figures for 
the validities of the tests were given in 
Table 3 without corrections. Some of these 
validities will now be presented with cor- 
rections. 

Corrections for restriction in range 
compensate for the high degree of selec- 
tivity employed by some of the partici- 
pating colleges. The corrections alter the 
validity coefficients so as to equal the 
values which would have been obtained if 
the range of scores on which the validities 
were observed had been equal to that for 
the entire SAT candidate population on 
the date of the testing in March 1951. At 
this administration the standard deviation 
of SAT-V for the candidate population 
was 113, and that for SAT-M was 110. 

Correction for test length was made by 
the Spearman-Brown formula after cor- 
rection for restriction of range had been 
accomplished. The validity coefficients 
were corrected to simulate their value had 
every test been of “practical length,” de- 
fined as 10 minutes for Perceptual Speed 
and 25 minutes for all others. 

Hither to average the validities and 
then apply two kinds of corrections or to 
apply the two corrections and then to 
average the corrected figures seemed to 
be covering up the observed validities 
with too much statistical folderol. There- 
fore, it was chosen not to do any averag- 
ing where corrections were made, but to 
select a few individual college findings 
with which to illustrate the effects of the 


proper corrections. The sample findings to 
be used for this purpose concern one cri- 
terion, the cumulative average, three 
colleges, A,, C, and I, and a selection of 
variables including SAT, ECT, High 
School Record, and four experimental 
tests at each college. The colleges selected 
were the three largest except that College 
H was avoided, because only relatively 
unsuccessful tests were administered there 
(see Table 1). The tests selected for each 
college were those whose average validities 
for all other colleges were the highest. The 
only exception to this rule was a limit of 
two set upon the number of information 
tests selected. This selection technique, 
which was considered to be the equivalent 
of a cross-validation, led to the selection 
of the same two information tests for all 
three colleges: Literature Information and 
Government Information. The other tests 
were necessarily different at each college, 
since none of them was administered to 
more than one of the selected colleges. 

Table 3 gives the selected data from 
Colleges A,, C, and I. For each college 
the first column gives the observed cor- 
relations. The second column gives the 
same correlations after corrections were 
made for restriction in range on SAT-V 
and SAT-M and for test length. 

It is apparent that, after the corrections 
are made on these data, neither the SAT 
nor the High School Record (College I) 
stand supreme as predictors. The highest 
validity is for Government Information. 
This probably reflects the importance to 
the criterion of width of serious reading 
outside of school requirements, The valid- 
ity may be as high as it is because the 
student who abounds in this kind of in- 
formation would probably possess both 
the aptitude measured by SAT-V and the 
willingness to spend time in serious extra 
study that produces a good high school 
record. While something more complex 
than breadth of information may be the 
desirable outcome of a college education, 
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it is, nevertheless, undeniably true that 
the students who can demonstrate a wide 
knowledge of facts are often the ones 
who can think most clearly and are cer- 
tainly the ones who fill up the academic 
honor roll. The runner-up tests, Literature 
Information, SAT-V, English Composi- 
tion Test, and Social Studies Reading, all 
confirm the importance of serious reading 
to college grades. 


Findings With Regard to Grades in Spe- 
cific Areas 


Table 2 compares the test validities for 
grades in three major-field areas with va- 
lidities for freshman courses in these areas. 
As in the comparison between freshman 
average and cumulative average, there is 
evident here only a very slight drop in 
most validity coefficients between the 
freshman and upper-class years. In these 
tables the freshman and upper-class cri- 
teria do not overlap as they did in the 
case of freshman and cumulative average. 
The major-field grade criteria were aver- 
ages of the appropriate course grades 
earned during the junior and senior years. 
In spite of this lack of overlap, the major- 
field yalidities have much similarity to 
the freshman validities. Here again the 
findings encourage the practice of using 
freshman grades as criteria of college suc- 
cess. 

It is, perhaps, of interest to note that 
between freshman year and the upper- 
class years the validity of the High School 
Record for major-field grades falls of 
more sharply than do the validities of 
most of the test scores. It seems possible 
that the falling off of validities for High 
School Record in the case of specific 
course areas may be brought about by 
differences between general courses taken 
particularly in freshman year and the 
specialized, major-field courses taken dur- 
ing junior and senior years. The study 
techniques or other methods used to gain 
good grades in high school cannot be very 


different from those required in the more 
general college courses. However, quite 
different techniques may be required for 
specialized major-field work. 

The validity of SAT-V is highest for 
social science; that for SAT-M is highest 
for science and mathematics. For humani- 
ties and languages the SAT-V validity is 
dominant over SAT-M as it is for social 
science, but both SAT validities are lower 
than they were for social science. The 
substantially lower validity of SAT for 
humanities and language grades cannot 
be attributed to low reliability of the eri- 
terion, because as shown in an earlier sec- 
tion, the reliabilities for the criteria in the 
three areas are, respectively, .79, .86, and 
83. 

The differences in validity for the SAT 
mentioned in the last paragraph are all 
significant at about the 1% level. Differ- 
ences mentioned below in connection with 
the other measures are less significant. 
They are based on fewer cases. Even some 
relatively small differences will be men- 
tioned to draw the reader’s attention to 
differences of interest in judging whether 
further data are likely to reveal significant 
differences which will lead to useful dif- 
ferential prediction among the various 
specialized areas. 

Among the experimental tests some dif- 
ferent patterns of validity coefficients may 
be found for the three areas. Sufficiency 
of Data and Physical Science Information 
appear from these data to be superior to 
SAT-M as specific predictors of science 
and mathematics grades. While SAT-V is 
as good as any of the tests as a specific 
predictor of social science grades, Social 
Studies Reading and Data Interpretation 
are shown by these data to serve about as 
well. Government Information, as might 
be expected from the content of the test, 
has a relatively high correlation with so- 
cial science considering that there has been 
no correction for its very short length, 
but it also has a surprising validity for 
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science and mathematics. The only spe- 
cific predictor for humanities and lan- 
guage grades is the poor general predictor, 
Perceptual Speed. A very good predictor 
of humanities and language grades, con- 
sidering its short length, is also one that 
might be expected to be suitable for this 
purpose, Literature Information. How- 
ever, this test shows no promise for dif- 
ferential prediction. 


SUMMARY 


The College Board in 1951 initiated a 
validity study of the SAT and a group of 
experimental tests at 10 colleges. This ar- 
ticle compares the validities of these tests 
for average freshman grades with their 
validities for the cumulative four-year 
average and graduation vs. nongradua- 
tion. The validities for freshman grades 
in certain subject-matter areas are com- 
pared with major-field grades in the same 
areas. 

It was found that the pattern of test 
validities for the four-year criteria closely 
resemble those for the freshman criteria. 
These data show the high school record 
to be less good for predicting the quality 
of major-field work than it is for predict- 
ing freshman average grades. Tests of 
government and literature information 
were the most successful among the ex- 
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perimental tests. When corrections were 
made for restriction of range and for test 
length, these two tests were actually found 
to be more valid for predicting the cumu- 
lative four-year grade average than was 
the SAT. Neither the SAT nor any of the 
experimental tests had an appreciable 
validity for predicting graduation. 
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A NOTE ON PART-WHOLE CORRELATION! 


FREDERICK B. DAVIS 
Hunter College 


When a correlation coefficient is com- 
puted between total scores (such as the 
Performance scores derived from the 
Wechsler Adult Intelligence Scale) and 
part scores (such as the Object Assembly 
scores) that are included in the total 
scores, the resulting coefficient is spuri- 
ously high. There has been some confusion 
in the literature regarding the source and 
amount of this spuriousness. It is the 
purpose of this note to clarify the matter. 

In deviation-score form of the original 
units of measurement, the product- 
moment correlation coefficient between 
scores on total t and on part a, which is 
wholly included in total t, may be written 
as follows: 


Tat = 1 (a)(atb+. tn) 
Elaa + b+... +0) 
Voa Via +b +.. +) 


Hence, 


sa + È sites 1 


, 


Tat = 
‘a st 


where the subscript j denotes any part of 
total t except part a. 

This coefficient is spuriously high be- 
cause the errors of measurement in scores 
on part a are also in scores on total t. 
To obtain a coefficient free from this 
correlation of errors in common, & parallel 
form of part a, to be denoted Part A, 
may be employed. Part A is not, of course, 


1 After this note had been accepted for 
publication, the writer's attention was called 
to a paper by Angoff (1) which he had not 
previously seen. Angofi’s equation (3) and its 
derivatives constitute special cases of the 
writer’s Equation [3] in the present paper. 
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included in total t. Then, 


n 
SATa F z Siaj 2] 


bi r omenes 
St e 


where raa is the reliability coefficient of 
part a. 

Since parts a and A are parallel forms, 
Sa = sa and Tas = ras . Therefore, we may 
rewrite Equation [2] as: 


n 
Satna + z Sjfaj £3] 


tpa a 
St 


It is obvious from Equation [1] that 


n 
D Siraj = Sfat — Sa- 
b 


Consequently, Equation [3] may be writ- 
ten as: 
Tat = Tat + araci = aA [4] 
St 

Either Equation [3] or [4] may be used 
to compute the product-moment coefficient 
of correlation between a total and a part 
included wholly within the total if one 
wishes to report a coefficient free from 
the inflating effect of the correlation of er- 
rors of measurement common to both. 
Equation [4] will be the more convenient 
if rat is known. 

The difference between the values of 
Equations [1] and [8] or of Equations 
[1] and [4] may be written as: 


Sall — raa) 


Tat — Tat = 
St 


This difference, it should be noted, is not 
equal to the correlation of the errors of 
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measurement in scores on total t and on 
part a. That correlation coefficient, ex- 
pressed in terms of the difference rat —Tat , 
is: 


Tat — TAt 


T, = Å— 
enet ar oa ae z ae 


where raa and rer are the reliability co- 
efficients of part a and total t, respectively. 

McNemar (2, p. 164) and other writers 
have referred to the correlation coefficient 
between a part score and the remainder 
of the total score as the part-whole cor- 
relation coefficient corrected for spurious- 
ness. But it is obvious from its very 
definition that such a coefficient is really 
not a part-whole correlation coefficient; 
it is instead a part-remainder correlation 
coefficient, it needs no correction for 
spuriousness, and it may be denoted and 
computed as follows: 


[5] 


Stat — Sa 


T(a)(t—a) .— eee 
Vs + sè — 2saStrat 


Use of Equations [4], [5], and [6] may 
be illustrated with data pertaining to the 
relationship between Part 11 (Object As- 
sembly) and the Performance total score 
(the sum of Parts 7, 8, 9, 10, and 11) of 
the Wechsler Adult Intelligence Scale (3). 
The basic data, reported in terms of 
Wechsler’s Sealed Score units for a group 
of 200 eighteen-nineteen year olds, are 
as follows (when Xer denotes a Scaled 
Score on the Performance total and Xo 
a Sealed Score on the Object Assembly 
part): 


[6] 


Xo = 10.00 


Top = .82 
So = 2.79 Too = .65 
Xp = 49.43 Top = .93 
sp = 11.83 


The correlation coefficient of .82 be- 
tween scores on Part 11 and the Per- 
formance total (ror) was computed di- 
rectly from the data. Equation [4] yields 


a value of .74 for the correlation between 
scores on Part 11 and the Performance 
total free from the spurious inflation 
owing to the perfect correlation of errors 
of measurement common to both scores. 
Equation [6] yields a value of .71 for the 
correlation between scores on Part 11 
and the sum of the remaining parts of the 
Performance total. Equation [5] yields a 
value of .52 for the correlation between 
errors of measurement in the entire Per- 
formance total and in Part 11 alone. 

As would be expected, the coefficients 
yielded by Equations [1], [3] or [4], and 
[6] range themselves in order of decreas- 
ing magnitude. The coefficient of .82 in- 
dicates the actual relationship of two 
partially overlapping variables—scores on 
Part 11 and the Performance total in the 
sample of 18-19 year olds. On the other 
hand, the coefficient of .74 indicates the 
relationship of two entirely separate vari- 
ables that measure the same abilities (plus 
chance) as Part 11 and the Performance 
total in the same sample of 18-19 year 
olds. This is a part-whole coefficient prop- 
erly corrected for spuriousness owing to 
the correlation of errors in common. The 
coefficient of .71 indicates the actual re- 
lationship of scores on Part 11 and the 
sums of scores on other parts included 
in the Performance total. This is a part- 
remainder coefficient. 

Of the three coefficients, the one having 
the value of .74 is most meaningful for 
comparison with the great majority of 
intercorrelations reported among mental 
tests. This is because such intercorrelations 
are ordinarily based on separate tests and 
are not inflated by correlation of errors 
of measurement in common. The co- 
efficient of .82 is of fundamental utility in 
computing variances, standard errors, etc. 
The meaning of the coefficient of .71 is 
clear, but this type of coefficient is not 
commonly of practical utility. Hach of 
these coefficients has its own particular 
merit and the distinctions among them 
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should be recognized so that one will not 2. McNemar, Q. Psychological statistics. 


5 New York: Wiley, 1955. 
be confused with another. 3. Wecuster, D. Manual for the Wechsler 


Adult Intelligence Scale. New York: 


REFERENCES Psychological Corp., 1955, Tables 6, 7, 
1. Ancorr, W. H. A note on the estimation and 10. 
of nonspurious correlations. Psycho- 
metrika, 1956, 21, 295-297. Received November 14, 1957. 
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THE INFLUENCE OF CONSISTENT AND INCONSISTENT 
GUIDANCE ON HUMAN LEARNING AND TRANSFER? 


BERNARD M. ARONOV 
University of Florida? 


In 1928, Goodenough (7) reported as 
a finding of her study on anger in young 
children an apparent relationship between 
inconsistency of parental discipline and 
frequency of anger outbursts. Other stud- 
ies on the effects of consistency and in- 
consistency followed. These dealt with 
various ways in which consistency or in- 
consistency is expressed, as for instance in 
parental demands, commands, ete. (1, 2, 
4, 5, 8, 12). The results of these studies 
all suggested the conclusion that incon- 
sistency in the behavior of an authority 
figure toward a child has disturbing effects 
both on the child’s immediate behavior 
and on his subsequent personality de- 
velopment. Support for this conclusion 
came from animal studies in which random 
reinforcement was a variable (11, 15, 
16). The last study on the effects of these 
variables appeared in 1952 (8), and our 
textbooks speak of the detrimental effects 
of inconsistency as established fact (e.g., 
3, 6, 9, 10, 13). 

The study here reported arose, however, 
as a result of an impression that the 
work done on the problem does not justify 
the conviction shown with regard to the 
detrimental effects of inconsistency. For, 
while the data strongly support the con- 
tention, e.g., that parental inconsistency 
has damaging effects on a child’s behavior, 


1 This paper is a condensation of the au- 
thor’s doctoral dissertation, completed in 
1956 at the University of Florida under the 
direction of Rolland H. Waters, who was also 
kind enough to read and make valuable sug- 
gestions about the present paper. The writer 
is also grateful to Henry S. Curtis, Morton 
S. Slobin, and Stanley Spiegel for their help 
with this paper. 

* Now at the VA Regional Office, Cleve- 
land, Ohio. 
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the nature of the studies gives reason to 
question the validity of the data. The 
direct data on inconsistency come from 
ease history and observational studies, 
studies too loosely designed to control 
for the possibility that it is the person 
being inconsistent rather than the in- 
consistency itself which is causing the 
damage. Baldwin, et al. (1), for instance, 
found that inconsistency figured in the 
rejecting parent’s behavior in that dis- 
cipline, decisions, etc., were based on the 
parent’s convenience. This finding would 
suggest that inconsistency is one avenue 
through which rejection is expressed, but 
for which the inconsistency could be ir- 
relevant. The studies involving random 
reinforcement seem better controlled, but 
only one (15) includes a study of im- 
portant transfer effects, and in every 
case we have no way of knowing how far 
we can generalize from infrahuman to 
human Ss. In brief, it appears that we 
actually do not know that inconsistency 
itself has a detrimental effect on behavior. 

The intent of this study, then, was to 
isolate and study the variables of con- 
sistency and inconsistency in a controlled 
laboratory setting. The study was not 
designed to investigate the effects of pa- 
rental consistency or inconsistency. Al- 
though primary interest has been in the 
effects of parent-child inconsistency, it was 
considered important to test the specific 
effects of these variables apart from other 
conditions. An attempt was made simply 
to answer the following question: If Ss 
are given consistent or inconsistent guid- 
ance while learning to solve a maze prob- 
lem, in what ways will their learning be- 
havior be affected both in the immediate 
learning situation and later when they are 
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no longer being guided and are confronted 
with a similar but different problem to 
solve? 


PROCEDURE 


Eighty-eight college students, male and 
female, under 26 years old, and essen- 
tially inexperienced with maze problems, 
served as Ss. As each S appeared at the 
experimental room, he or she was assigned 
randomly to one of three groups, known 
as Groups I, II, and C. The Ss were 
seated before a shield which obscured the 
apparatus and the experimenter. They 
were given a general description of the 
type of maze they were to learn, and told 
that their purpose was to learn to guide 
a stylus from start to goal without error. 
They were told further that when they 
reached the goal at the end of each run 
both the red and the green light suspended 
before them would flash to signal the end 
of a trial. In addition to this general 
orientation, Ss in Groups I and II were 
told that when they made a correct turn 
the green light would flash, and when 
they made a wrong turn the red light 
would flash. 

The stylus was placed in the Ss hand 
and guided to the starting point of a 
standard 10-turn Warden U-type maze 
(14) employed, and the S was told to 
start. Light cues were given Group I Ss 
consistently according to instructions. Un- 
known to Group II Ss, however, the light 
cues given them were wrong at three of 
the ten choice points on each trial. Also, 
the choice points at which wrong cues were 
given were varied from trial to trial ac- 
cording to a prearranged pattern. Group 
C Ss were given no guidance and served 
as the control group. x 

Whether or not they reached the cri- 
terion of one errorless run, all Ss were 
Tequired to run trials, after which all 
were stopped and transferred to the lat- 
eral reverse of the practice maze pat- 
tern. Here they were told that their task 


and purpose were the same, but that the 
only light cues they would see would be 
those at the end of each run. All Ss were 
then allowed to run until they reached 
the criterion of one errorless trial. Records 
were kept of errors and time per trial, and 
of trials to criterion, and notes were taken 
of spontaneous behavior exhibited. Fol- 
lowing completion of the second maze 
problem, Ss were interviewed with regard 
to their impressions of the experimental 
experience. 


RESULTS 


The quantitative results are summarized 
in Tables 1 and 2. It will be noted that no 
figures are given for trials to criterion 
on the practice maze. The reason for this 
is that since only 11 Group I Ss and eight 
Group C Ss reached criterion in 15 trials, 
it was not possible to compute mean trials 
to criterion. Instead, the percentages of 
Ss in each group who reached criterion 
were computed, and these percentages 
were compared using the chi-square 
method. 

For the inconsistently guided Group II, 


TABLE 1 
Practice MAZE PERFORMANCE 


Means and Standard Deviations 


Group Errors Time 
M SD M SD 
I 51.77| 12.43 | 425.30 | 157.83 
II 86.43| 13.22 | 668.57 | 324.67 
Cc 56.50| 11.66 | 523.70 | 212.54 


F Ratios and és 


Errors Time 
F t F t 
I—II 1.13| 10.22**| 4.23*| 2.76** 
I-—C 1.14) 1.60 1.81 2.14* 
II—C 1.28} 9.11**) 2.33 2.91** 


* Significance at .05 level. 
** Significance at .01 level. 
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TABLE 2 
TRANSFER MAZE PERFORMANCE 


Means and Standard Deviations 


Group Errors Time Trials 
M SD M SD M SD 
48.10 30.72 368.20 212.27 16.06 7.46 
II 109.14 68.19 663.86 482.82 32.75 25.61 
Cc 57.00 47.98 403.93 243.75 19.40 15.16 
F Ratios and fs 
Errors Time Trials 
F t F t F t 
I—II 4.92* 4.34** 5.17* 2.98** 11.89* 3.32** 
I—C 2.44* 0.86 1.31 0.63 4.14* 1.14 
II—C 2.02 3.34** 3.92* 2.56* 2.85* 2.39* 


* Significance at .05 level. 
** Significance at .01 level. 


mean total errors and time were signifi- 
cantly greater than those for Groups I 
and C, both on practice and transfer 
mazes. A significantly smaller proportion 
of Group II Ss reached the criterion 
within 15 trials on the practice maze 
(chi square 14.05, significant beyond the 
01 level). Group II Ss required signifi- 
cantly more trials to reach criterion on 
the transfer maze than did Groups I and 
C. Group C practice maze time was sig- 
nificantly greater than that of Group I, 
but otherwise Group I performed only 
slightly and insignificantly better than did 
Group C. 

While variances did not differ signifi- 
cantly for the practice maze, they did for 
the transfer maze. Group II variances 
were significantly greater than those of 
Group I for all measures, and greater 
than those of Group C for time and trials 
to criterion. Also, Group C variances were 
significantly greater than those of Group 
I for errors and trials to criterion. 

The behavioral data characterized 
Group I Ss as initially dependent upon 
the light cues but as gradually showing less 
dependence upon them. Group II Ss were 


more characteristically confused by the 
lights at first and then reacted to them 
in one of three ways: either they rebelled 
against instructions and ignored the lights, 
they were confused and ambivalent about 
them, or they followed them passively. 
Group II Ss tended also to be uneasy 
about verbalizing doubts concerning the 
accuracy and usefulness of the light cues; 
those bold enough to rebel against the use 
of the cues were quite outspoken, but at 
the other extreme those who followed 
the cues passively distorted their per- 
ceptions of the situation so far as to in- 
sist that the cues were helpful. Further, 
transfer maze performances for Group II 
were related to the degree to which Ss 
had ignored the light cues on the practice 
maze, ie. those who ignored the lights 
tended to do as well as the best in Groups 
I and C, ete. Group C Ss approached the 


mazes in a matter-of-fact, business-like 
manner. 


Discussion 


The excessive variance of Group II 
transfer maze performance, together with 
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the protocol material, makes an interpre- 
tation of the effect of the inconsistent 
guidance difficult. On the one hand, the 
group performances suggest that to a 
significant degree Group IL was adversely 
affected by the inconsistent guidance. On 
the other hand, however, the magnitude 
of Group II variance cautions against 
drawing such a broad conclusion, because 
the inconsistency can hardly be said to 
have had a very uniform effect on Group 
II Ss. 

The results seem to become understand- 
able when Group II variance and be- 
havioral data are considered in detail. 
First, the possibility of a sampling bias 
contributing to the variability can be 
discarded on the grounds that the groups 
showed similar variability on the practice 
maze. Can it be concluded then that the 
inconsistency itself produced the variabil- 
ity, or did the inconsistency bring into 
play personal variables which determined 
individual performances? 

The behavioral data suggest the latter 
of the two possibilities. It appears that 
the inconsistency provoked three grossly 
different personal reactions, ie. a de- 
fiant and rebellious one, & confused and 
ambivalent one, and a passive one. It ap- 
pears further that the particular personal 
reaction provoked was related to trans- 
fer maze performance. It is true that the 
inconsistent cues had the initial effect of 
confusing all the Group II Ss (perhaps 
accounting for the more similar variance 
of practice maze performance), but this 
effect did not last for those Ss who were 
able to break away from the light cues 
and to attend to cues from the maze 
itself. Lasting confusion and damage to 
performance seemed to occur primarily 
when Ss could not break away from the 
inconsistent guidance. These Ss emerged 
from the practice maze experience with 
little useful information to apply in deal- 
ing with the transfer maze. It would seem 
necessary to conclude, then, that the in- 
consistent guidance had the immediate 


effect of confusing the recipient, but that 
its effects were temporary unless the re- 
cipient was unable to rebel against the 
inconsistent guidance. 

Some clues are present which suggest 
an explanation for this behavior of Group 
II Ss. It appears that the more ambivalent 
and passive Ss were those who seemed to 
fear offending the experimenter and/or 
being embarrassed by questioning the cues. 
For personal reasons these people seemed 
to feel uncertain enough in the relation- 
ships with themselves and/or with the 
experimenter to feel that it was important 
not to question the experimenter too 
seriously if at all—the safest reaction be- 
ing complete passivity. 

Finally, a few words might be said 
about the variance differences between 
Groups I and C, where actual perform- 
ances did not differ significantly. It is 
felt that the guidance given Group I en- 
couraged group conformity of perform- 
ance, while no guidance perhaps allowed 
Group C Ss to develop whatever poten- 
tials they had. 


IMPLICATIONS 


If the results of this study are de- 
pendable, they raise important questions 
about the origins of behavior pathology. 
Broadly speaking, the results make it diffi- 
cult to maintain the position that a par- 
ticular type of experience will affect per- 
sonality in a particular way. We are 
confronted again by that constant source 
of irritation, the intervening variable. In 
this particular instance, the effect that 
the “experience” had was apparently in- 
fluenced by how the S perceived the sit- 
uation, and that perception seemed in 
turn influenced by how secure the S felt 
in relation to himself and/or to the ex- 
perimenter. What apparently was im- 
portant here was whether the S perceived 
the situation as one in which he could 
comfortably question the misguiding in- 
formation he was receiving. 


84 BERNARD M. ARONOV 


It would seem important, then, in un- 
derstanding the origins of behavior dis- 
turbance, to study some of the intervening 
variables which could play a role in de- 
termining the effect that a particular ex- 
perience might have on the developing 
personality. With regard to the results 
of the present study, it would seem im- 
portant to know more about the variables 
which influence a person to perceive a 
situation as one in which he could or could 
not comfortably question inconsistent 
guidance being given him by an authority 
figure. In the final analysis, a study of 
such intervening variables may reveal that 
what a parent actually does or does not 
do with regard to his child is not nearly 
so important for the developing person- 
ality as is, for instance, the interpersonal 
relationship in which this act occurs. 


SUMMARY AND CONCLUSIONS 


This study was designed to answer the 
question: If Ss are given consistent or in- 
consistent guidance while learning an in- 
itial maze problem, in what ways will 
their learning behavior be affected both 
in the immediate and in a transfer situa- 
tion? On a 10-turn Warden U-type maze, 
Ss were given either consistent, incon- 
sistent, or no guidance. After 15 trials 
under one of these conditions, all Ss were 
transferred to the lateral reverse of the 
initial maze where all were required to 
run without guidance until one errorless 
run was achieved. After learning the trans- 
fer maze, Ss were interviewed for im- 
pressions of the experiment. The Tesults 
suggested the following statements in an- 
swer to the motivating question: 

1. The influence of consistent guidance 
is not markedly different from that of no 
guidance. 

2. While inconsistent guidance is being 
given it has a confusing and generally 
detrimental influence on learning as com- 


pared with the influence of consistent or < 


no guidance. 

3. Inconsistent guidance does not nec- 
essarily have lasting damaging influence 
on learning behavior. 

4. Lasting damage to learning behavior 
results from inconsistent guidance when 
the recipient of the guidance is for some 
reason unable to rebel and ignore the 
guidance. 
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A TECHNIQUE FOR MEASURING CLASSROOM BEHAVIOR 


DONALD M. MEDLEY AND HAROLD E. MITZEL 
Division of Teacher Education of the Municipal Colleges of New York City 


One of the most difficult problems that 
must be solved before useful results can 
come from research into the relationship 
between teacher personality and pupil 
growth is that of securing objective meas- 
ures of the teacher’s personality as it 
functions in the classroom. The usual ap- 
proach to this problem has been to use 
ratings by supervisors or specially trained 
observers, but, despite all attempts to 
improve them, such ratings are still biased, 
subjective, and in many cases uninterpret- 
able by anyone, even the rater himself. 

Whatever value such ratings have arises 
from the fact that they are based on ob- 
servations of the teacher while he is 
teaching; their most serious limitations 
arise from the fact that the evaluative 
judgment of the rater intervenes between 
the behavior and the score supposed to 
reflect it. There are at least two sources 
of variation introduced here that attenuate 
the validity of the ratings by distorting 
measured differences between teachers. 
The cues upon which the observer bases 
his judgment and the relative weights as- 
signed to them are both allowed to vary 
from observer to observer to some un- 
known degree. By providing a schedule 
for recording behaviors listing the cues to 
be responded to, the first source of error 
may be virtually eliminated. By making 
the assignment of weights a clerical task 
done by someone other than the observer, 
the second may also be made negligible. 

As a part of a longitudinal study of 
graduates of the Teacher Education pro- 
gram of the municipal colleges of New 
York City (City, Hunter, Brooklyn, and 
Queens) carried out in the Office of Re- 
search and Evaluation of the Division of 
Teacher Education, a technique for ob- 
jectively observing and recording class- 
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room behaviors was developed. The Ob- 
servation Schedule and Record (OScAR) 
was constructed by modifying and com- 
bining the methods proposed by Cornell 
(1) and Withall (4) on the basis of the 
results of tryouts of the two techniques. 
Three basic changes were made. 

Inspection of the reliabilities of the 
scales prepared by Withall and Cornell 
showed that some of them suffered from a 
lack of observer agreement to a degree 
that seriously impaired their accuracy (2). 
Accordingly, the first change was designed 
to increase observer accuracy. If an ob- 
servational technique is such that it takes 
a highly trained observer to use it suc- 
cessfully, it has limited usefulness, and 
results of future measurements may be 
suspect because the observers may be in- 
adequately trained. For this reason, the 
scales of both Cornell and Withall were 
redefined in somewhat simpler terms for 
use in the OScAR in order to minimize the 
amount of training necessary for its use. 

Experience with these two techniques 
also showed that the often-adopted prac- 
tice of sending several observers into the 
classroom together (presumably so that 
one observer can record what another 
misses) is uneconomical. A score based on 
observations made by two observers who 
see a teacher at different. times is actually 
more reliable than one based on observa- 
tions made by two observers who see the 
teacher at the same time; and it seems 
intuitively obvious that the former score 
is more valid as well, since the behavior 
sample obtained is twice as great. The 
OScAR was therefore designed to be used 
by a single observer visiting a classroom 
by himself. 

The third change involved was the sepa- 
ration of the process of scoring from the 
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process of observing teacher behaviors. 
The OScAR was designed to permit the 
recording of as many aspects of what goes 
on in a classroom as possible, regardless 
of their relationship to any dimension or 
scale. The observer’s sole concern was to 
see and hear as much of what was going 
on as he could, and to record as much of it 
as the structure of the OScAR permits, 
without any attempt to evaluate what he 
saw. 


Description or THE OSCAR TECHNIQUE 


The OScAR technique is both a method 
of observing and a method of recording 
classroom behavior; in the interests of 
simplicity the two aspects will be de- 
scribed simultaneously. 

The observer making a visit to a class- 
room arrives at—or near—a prescheduled 
time, so it is usually not necessary for 
him to greet the teacher or class when he 
arrives. Instead, he tries to enter and take 
a seat at the back of the room as unob- 
trusively as possible. He first notes the 
time and the number of pupils present in 
the spaces at the upper left corner of the 
“front” of a specially printed 5 x 8 card 
(see Fig. 1)? Then he starts his stopwatch 
and begins to record behaviors on the 
front of the card by checking as many of 
the items in the Activity Section as de- 
scribe what he sees. 

The Activity Section consists of 44 ac- 
tivities likely to be observed in a class- 
room, such as “teacher works with individ- 
ual pupil,” “pupil writes or manipulates 
at his seat,” “pupil laughs.” Varying num- 
bers of the Activity items may be checked, 


1 Tables A through G and Figures 1 and 2 
have been deposited with the American 
Documentation Institute. Order Document 
No. 5556, remitting $1.75 for 35 mm. micro- 
film or $2.50 for 6 by 8 in. photocopies. 
Typescript copies of a more detailed version 
of this paper containing all tables will be 
furnished on request to the authors while 
the supply lasts. 


according to how many different kinds of 
activities are going on at one time. 

Tke observer then concentrates on the 
Grouping Section. The Grouping Section 
lists four sizes of groups from “at least 
half of class in group with teacher” and 
“at least half of class in group without 
teacher” to “pupil as individual.” In 
Column I he checks each type of admin- 
istrative group (i.e. group apparently set 
up by the teacher) that he can detect in 
the class and each type of social group he 
observes—a social group being defined as 
one in which there is pupil-pupil or pupil- 
teacher interaction. 

Next the observer checks the type of in- 
structional materials being used, in the 
Materials Section, which lists various 
learning aids and materials such as black- 
board, audio aid, text or workbook. All 
through this initial period, the observer 
keeps alert for any type of activity, group- 
ing, or material not already checked, and 
checks the appropriate item for each one 
as it occurs. No item on this side of the 
card is checked more than once during this 
time, however. Items in the Signs Section 
(which consists of items considered symp- 
tomatic of classroom climate, like “teacher 
shows affection for pupil” and “pupil 
moves freely”) are marked with a plus 
sign if and when they are observed. At 
the end of five minutes the observer 
briefly considers each item in this section 
not already marked, and marks it either 
plus or zero. 

As soon as he has done this, the observer 
stops his watch and turns the card over 
(See Fig. 2). In the Subject Section, which 
lists the 10 most common subject areas, 
he checks in Column I whichever of the 
10 areas of instructional activities has re- 
ceived most attention during the five 
minutes just ended. 

The observer then starts his stopwatch 
again and begins to tally each statement 
the teacher makes in one of five cate- 
gories: Pupil-Supportive, Problem-Struc- 
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turing, Miscellaneous, Directive, Reprov- 
ing. He makes a tally in Column II of the 
Expressive Behavior Section in the line 
corresponding to the category in which 
each statement is classified. 

At the same time, he watches for 
changes of expression on the teacher’s 
face, such as smiles, frowns, and scowls, 
and for expressive gestures such as nods, 
threatening glances, and body movements. 
Each time he observes a look or gesture 
which he judges to express approval of or 
affection for a pupil, the observer makes 
a tally in Column II after Item K1; each 
time he observes a look or gesture which 
he judges to be hostile or reproving, he 
makes a tally after K7. 

This continues for a second period of 
five minutes. At the end the observer stops 
his watch again and fills out Column II 
in the Subject Section just as he filled out 
Column I at the end of the first five- 
minute period. He then turns the card 
over, starts his stopwatch again, and pro- 
ceeds as in the first period for five minutes 
more, except that he uses Column III 
rather than Column I. This alternation of 
sides of the card is continued until six 
five-minute periods of observations are 
completed. 


COLLECTION or DATA 


The observations which form the pri- 
mary data of this study were made with 
OScAR in the classrooms of 49 beginning 
teachers in publie elementary schools in 
New York City over a period of approxi- 
mately 10 weeks. Of the 49 teachers, 46 
were female, 3 male. The teachers were 
scattered among 19 schools in four bor- 
oughs, the number of teachers in a single 
school ranging from two to five. Twenty- 
three of the teachers taught Grade 3, 
thirteen Grade 4, nine Grade 5, and four 
Grade 6. 

Observers worked in pairs, two ob- 
servers visiting a school together. In most 
cases, all of the teachers in a school were 
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seen by both observers in a pair on the 
some day, although in no case did two 
observers visit the same teacher at the 
same time. No attempt was made to con- 
trol the type of activity observed; all that 
was asked was that the teacher and the 
class be present in the classroom. 

A number of minor shortcomings in the 
original OScAR (2) having been noticed, 
it was revised to the form described in 
this report. The new form was adopted at 
the beginning of the second round of visits. 
The pairs of observers who went to the 
schools together were reshuffled somewhat, 
and a new schedule in which each observer 
was to see each teacher once again was set 
up. The first visits were made on January 
24, 1955, and the last on Tuesday, April 
5, 1955. 


ANALYSIS OF THE OBSERVATIONAL 
Recorps 


The analysis of the data followed four 
steps. First, a preliminary study was made 
of each item to find out whether there 
were reliable differences in the number of 
times the behavior was observed in the 
classrooms of different teachers. Next, the 
items were combined into 14 “keys,” which 
were scored. Third, a factor analysis of 
scores on these 14 keys was made; and 
finally, the keys were combined into three 
factor dimensions. 

The results of the analysis of individual 
items are given in Tables A through F* 
Except in the case of a few items that 
were highly reliable by themselves, those 
items that discriminated well were com- 
bined into provisional keys on the basis 
of a priori judgment that they belonged 
together. 

For example, the following three items 
from the Activity Section: 

E1. pupil talks to a group 
E5. pupil demonstrates or illustrates 

E10. pupil leads the class 
were combined into a single key called 
“Pupil Leadership Activities.” (The com- 
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TABLE 2 
Loapines or FOURTEEN OSCAR Scorine KEYS ON THREE ORTHOGONAL FACTORS 
(N = 49) 
Emotional] Verbal Social Specific | Commu- 
Scales Climate | Emphasis | Structure | Factors | nalities 
(1) Time spent on reading +.09 +.85 —.03 .52 73 
(2) Problem-structuring teacher +.54 +.31 — .43 +.66 57 
statements 
(3) Autonomous administrative +.15 +.59 +.49 +.63 -60 
groupings 
(4) Pupil leadership activities +.55 —.41 +.30 +.66 56 
(5) Freedom of movement — .05 +.29 +.48 +.83 32 
(6) Manifest teacher hostility —.76 +.05 — .22 +.61 62 
(7) Supportive teacher behavior +.63 pel —.02 +.77 Al 
(8) Time spent on social studies —.16 — .60 +.13 +.78 -40 
(9) Disorderly pupil behavior —.73 +.06 +.20 +.65 .58 
(10) Verbal activities +.03 +.65 +.01 +.76 42 
(11) Traditional pupil activities +.47 +.43 — .32 +.70 S1 
(12) Teacher’s verbal output +.15 +.45 —.60 +.64 59 
(13) Audio-visual materials +.18 +.05 +.55 +.82 34 
(14) Autonomous social groupings —.22 +.18 +.72 +.63 60 


TABLE 3 


INTERCORRELATIONS AMONG THREE Factor 
Scares Basep on OScARs or 49 
BEGINNING TEACHERS 


(Reliabilities in The Diagonal) 


Scale EC VE ss 


Emotional Climate 
Verbal Emphasis 
Social Structure 


(.903)/—.004 |—.110 
(.770)|-+ .028 
(.826) 


position of each of the 14 keys found to 
discriminate is given in Table G.) The 
reliability of each key was estimated from 
a three-way analysis of variance under 
mixed-model assumptions—teachers and 
visits being regarded as random effects 
and items as a fixed effect. 

The coefficient of reliability so obtained 
is a maximum likelihood estimate of the 
expected correlation between the mean of 
all the scores assigned to the teachers by 
the six observers on the basis of the twelve 
visits made, and means of scores that 
would be assigned to the same teachers by 
six different observers visiting their class- 
rooms at twelve other times. Errors arising 


from three potentially important sources 
are taken into account: errors resulting 
from fluctuations in teacher and pupil be- 
haviors during several weeks, errors re- 
sulting from differences in ways in which 
various observers would tally identical 
behaviors, and errors resulting from the 
failure of an observer to note and record 
all that happens during a five-minute 
period. 

Table 1 shows the reliabilities of all 14 
scoring keys and their intercorrelations. 
The sizes of the reliability coefficients in- 
dicate that these teachers’ classes differed 
widely with respect to what was going on 
in them. 

The intercorrelations among the 14 di- 
mensions suggest that the differences 
might be described in terms of fewer than 
14 variables, so a centroid factor analysis 
was made and three factors extracted. 
The centroid factor matrix was rotated 
orthogonally twice according to the pro- 
cedure proposed by Reyburn and Taylor. 
Table 2 shows the loadings of the original 
keys on the three factors after rotation. 
The factors were named Emotional Cli- 
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mate, Verbal Emphasis and Social Struc- 
ture. 

$ Three scales were constructed by com- 
bining, with equal weights and the signs 
indicated by the loadings, the scores on 
those keys most highly loaded on each 
factor. Table 3 shows the reliabilities of 
these three scales and their intercorrela- 
tions. The three scales are practically in- 
dependent of one another (as would be 
expected) and are highly reliable. 


DISCUSSION 


The effort made in this study to secure 
quantitative, objective information about 
happenings in ordinary classrooms and 
typical learning situations was not in- 
tended to imply that ratings by super- 
visors and other qualified observers may 
not serve a useful purpose. It arose from 
the conviction that there are purposes 
such ratings cannot serve. One such pur- 
pose is research into the nature of teacher 
cffectiveness—research seeking to answer 
questions about how teachers influence 
pupil learning. 

Information that effective teachers are 
warm and friendly, or firm but fair, or 
that they explain things clearly, is useful 
In this sense only if these terms are oper- 
ationally defined. If such operational defi- 
nions must be phrased in terms of expert 
judgment, they can tell us only about ex- 
pert judgment. Whatever inferences re- 
Search with a technique such as OScAR 
Justifies will tell educators what a teacher 
should do in specifie terms—not what 
ee reaction to his behavior ought 

A study of the factorial structure of the 

4 scoring keys indicates that the OScAR 
technique gives reliable information about 
three relatively diserete dimensions of 
classroom behavior—the social-emotional 
climate, the relative emphasis on verbal 
earnings, and the degree to which the 
Social structure centers about the teacher. 


Certainly there must be many other im- 
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portant differences in ways teachers and 
pupils behave that are not included in this 
list. It,is important that such differences 
be identified and techniques developed for 
observing them. 

The potential importance of the kind of 
objective data about classroom behavior 
that can be obtained in this way is very 
great. Practical problems such as how to 
select students likely to become successful 
teachers, how to screen out those who can- 
not get along with children, and what 
ought to be the content of teacher training, 
can be solved in no other way than by 
studying teachers’ classroom behavior. 


SUMMARY AND CONCLUSIONS 


The OScAR was developed as a device 
for securing a record of behaviors of teach- 
ers and pupils observed by a classroom 
visitor. It was used in a series of 588 half- 
hour visits made by six observers visiting 
49 teachers twice each. Items which on the 
basis of content appeared to belong to- 
gether were grouped into 14 keys which 
were found to have reliabilities of at least 
60. A factor analysis identified three or- 
thogonal factors accounting for most of 
the observed differences. 

The three aspects in which the behav- 
iors observed in the 49 classrooms differed 
were: Emotional Climate, having to do 
with the relative amount of hostility ob- 
served; Verbal Emphasis, having to do 
with relative emphasis on verbal and tra- 
ditional schoolroom activities; and Social 
Structure, having to do with the relative 
degree of pupil-initiated activity. These 
three aspects were found to be orthogonal 
—a hostile class was no more likely to be 
verbal, or to have a restricted social or- 
ganization than one less hostile. 

It was concluded that (a) relatively un- 
trained observers using an instrument like 
OScAR can develop reliable information 
about differences in classrooms of different 
teachers, (b) that the OScAR technique is 
sensitive to only three of many dimensions 
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that probably exist, and (c) that obser- 
vations made with instruments of this type 
can contribute to the solution of many 
important problems having to do with the 
nature of effective teaching. 
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The essential purpose of this study was 
to determine the success with which sev- 
eral test instruments could predict the 
pupil-teacher rapport achieved by a group 
of teachers. The participating subjects 
took the tests as student-teachers; the eri- 
terion measure of rapport was obtained 
approximately one year later in the class- 
rooms of the same subjects, who were then 
completing their first year of teaching. By 
employing test and criterion measures that 
were clearly separated in time, the study 
attempted to determine the predictive 
validities of the tests for the criterion used. 

Pupil-teacher rapport was measured 
through pupil responses to questions about 
their class and their teacher. The variable 
to be predicted was, therefore, not teacher 
behavior, but pupil reactions to teacher 
behavior. Since it cannot be assumed that 
pupils respond in similar fashion to similar 
teacher behaviors, tests that validly pre- 
dict various aspects of the classroom be- 
havior of teachers might not predict pupil 
responses to such behavior. For this rea- 
son, a number of measures based on the 
teachers’ classroom behavior were included 
in the study as a “bridge” between the 
test measures and criterion measure of 
major interest. 


METHOD 


During the 1953-54 academic year, over 
1600 students who were enrolled in student 
teaching in the four municipal colleges of 
New York City were given a battery of 
tests. Some of the instruments of the 


This is one of a series of studies of 
teacher behavior currently being conducted 
by the Office of Research and Evaluation of 
the Division of Teacher Education of the 
Municipal Colleges of New York City. A 
longer version of the present paper may be 
had on request as long as the supply remains. 
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battery were standardized inventories; 
others were experimental in nature. The 
students took the tests at the beginning 
of the student-teaching semester, which 
occurred at the end of their senior year. 

During the academic year 1954-55, a 
follow-up of the student teachers who 
were tested the year before and had 
subsequently received bachelors degrees 
was undertaken. Those students who were 
then teaching in Grades 3 to 6 in New 
York City public elementary schools in 
which at least one other member of the 
group was also teaching were encouraged 
to participate as subjects in an observa- 
tional study. Of approximately 75 teachers 
who met these criteria, it was possible to 
conduct intensive observations in the class- 
rooms of 49. In addition, several tests 
were administered to the pupils taught by 
these 49 teachers and to the teachers 


themselves. 
This report will discuss three kinds of 


data: 

1. Test scores of 49 student-teachers 
obtained during their senior year in col- 
lege. 

2. Classroom behavior records obtained 
through systematic observation approxi- 
mately one year later in the classrooms of 
these 49 former student-teachers. 

3. Scores on pupil-teacher rapport as- 
signed to the 49 teachers on the basis of 
the reactions of their pupils to a paper- 
and-pencil attitudinal measure. 


Test Scores 


From the large group of tests taken by 
the student teachers, the authors selected 
the following tests which, on the basis of 
prior research and educational theory, 
could be expected to function as predictors 
of pupil-teacher rapport. 
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1. The Minnesota Teacher Attitude In- 
ventory (MTAI). The MTAI was scored 
with two keys: the first, the published, 
empirically-derived key (3), and the sec- 
ond, an experimental key in which the 
items were scored on an a priori, rational 
basis. i 
2. The California F Scale. A 30-item 
version of the F scale was developed using 
the item analysis data in The Authoritarian 
Personality (1). , 

3. The Draw-a-Teacher Technique 
(DaTt). In the DaTt, a subject is given in- 
structions to “draw a teacher with a class” 
(10). The drawings were scored by three 
scorers along three dimensions—Teacher 
Initiative, Psychological Distance, and 
Traditionalism in Classroom Organization 
(9). Interscorer agreement was estimated 
by analysis of variance procedures; the 
following intraclass correlations were ob- 
tained: 


Teacher Initiative.................5 90 

Psychological Distance.............. 93 

Traditionalism in Classroom Organi- 
PRON ad narnia sa ove. aia 84 


4, Sims SCI Occupational Rating Scale 
(SCI). The SCI scale “is an instrument de- 
signed to reveal the level in our social 
structure—i.e. the social class—with which 
a person unconsciously identifies himself” 
(11, p. 1). A subject taking the SCI scale 
indicates whether he generally considers the 
people in each of 42 occupations (repre- 
sentative of varying levels of socioeconomic 
status) as belonging in the same, a higher, or 
a lower social class than he himself does. 

5. Strong Vocational Interest Blank (In- 
dex R). Index R is a 95-item key developed 
by Mitzel (8) for the Strong Vocational 
Interest Blank for Women. This key is com- 
posed of those items which successfully 
discriminated high-rapport and low-rapport 
teachers (differentiated on the basis of 
principals’ judgments and MTAT scores) 
and which survived cross-validation (based 
on extreme groups differentiated by the 
MTAI). 

6. Inventory IV—Satisfaction Score. In- 
ventory IV is an experimental inventory 
consisting of 32 multiple-choice items deal- 
ing with student-teaching experiences, It 
is scored to obtain a measure which, on the 
basis of the manifest content of the re- 
sponses, appears to indicate the student- 
teacher’s satisfaction with the student-teach- 
ing experience (2). 
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Measures of Classroom Behavior 


A technique for observing and record- 
ing what occurs in a classroom, called 
the Observation Schedule and Record 
(OScAR), was developed to provide a 
means for objectively describing a variety 
of different classroom activities (6). The 
technique provides measures of the fre- 
quency of occurrence of specific classroom 
events, and requires few inferences on the 
part of the observer. 

Each of the 49 teachers was observed by 
six different research workers. They ob- 
served each teacher for two one-half 
hour periods, adding up to a total of 588 
observation periods. No two observers 
visited any given classroom at the same 
time. 

From the basic behavioral data sup- 
plied by the OScAR technique, indices 
of teacher and pupil activities, types of 
pupil groupings, classroom climate, and 
expressive behavior of the teacher were 
derived. Of 14 dimensions developed, the 
following four were selected for study in 
this report because they seemed to be con- 
ceptually related to both the test measures 
and the criterion. 

1. Disorderly Pupil Behavior. This di- 
mension focuses on pupil behavior which re- 
flects either hostility or disruptive activity 
(eg., pupil ignores teacher’s question, scuf- 
fles, etc.). It is a general index of the dis- 
order present in a given classroom. Its re- 
liability, determined by agreement among 
observers of the same class on different Oc- 
casions, was estimated to be .89. 

2. Manifest Teacher Hostility. This di- 
mension provides an index of the overt, 
hostile, nonintegrative activity of the 
teacher. Verbal and nonverbal behaviors 
judged to reflect teacher hostility (€g 
sarcasm and scowling) were tallied and 


combined for this dimension. Its reliability 
was estimated to be 92. 

3. Pupil Leadership Activities. This di- 
mension provides an index of the amount 
of pupil leadership the teacher allows iD 
classroom activity. It is based on activities 
in which a pupil addresses, or demonstrates 


to, the class, The reliability of this measure 
was estimated to be .72. 
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4. Freedom of Movement. This dimen- 


TA gs A 
* sion offers an index of the freedom of move- 


ment exhibited by both pupil and teacher 
in the classroom. It reflects the teacher’s ap- 
parent willingness to circulate among the 
pupils, and the ease with which a pupil can 
Move about without requiring special per- 
meson: Its reliability was estimated to be 


Three other measures of the classroom 
were derived from global ratings of the 
classroom setting. The observers consulted 
the drawing scales developed and em- 
Ployed as part of the Draw-a-Teacher 
technique described earlier. After com- 
Pleting an observation period, the observer 
rated the class on each of the following 
dimensions: Teacher Initiative, Psycho- 
logical Distance, and Traditionalism in 
Classroom Organization. The reliabilities 
of these ratings were estimated to be .72, 
‘71, and .85, respectively. 


Measure of Pupil-Teacher Rapport 


In the present study, pupil-teacher 
Tapport was defined as the generalized, 
Conscious, subjective regard expressed by 
Pupils for their teacher. In order to 
Secure measures of the way in which the 
pupils perceived their teacher, an inven- 
tory, My Class, was constructed (5). 

his inventory consists of 47 scored items 
comprising four scales: Halo, Disorder, 
Supportive Behavior, and Traditionalism. 
The Halo scale is designed to indicate the 
extent to which the pupils have a general 
feeling of liking for the teacher, while 
the other three scales are intended to 
measure fairly specific teacher and pupil 
behaviors, 

My Class was administered to all the 
Pupils in the classes of the 49 teachers 
Participating in the study. The items 
Were read aloud to the pupils by a test 
administrator, while the teacher sat at the 
back of the room and filled out an inven- 
tory unrelated to the pupils’ activity. 
The proportion of the class giving the 
keyed response to each item was used as 
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the teacher’s score on that item. A 
teacher’s score on each scale was the sum 
of proportions for all of the items of that 
seale, “appropriately weighted plus or 
minus. 

The Halo scale consists of the following 
eight items scattered throughout the My 
Class inventory: 

1. Do you ever feel like staying away 
from school? 

2. Do you like to be in this class? 

3. Do you have much fun in this class? 

4. Do you learn a lot in this class? 

5. Are you proud to be in this class? 

6. Do you always do your best in this 
class? 

7. Do most of the pupils like the 
teacher? 

8. Does the teacher help you enough? 
The reliability of this scale, estimated by 
analysis of variance procedures, was .89. 


ReEsvits 


For each teacher in this study, there 
were 17 measures:* nine test scores, seven 
classroom observation measures, and one 
measure of pupil-teacher rapport based on 
pupil reactions. The primary analysis of 
these data consisted of correlating each 
of these measures with the other 16. 
The resulting correlations are contained 
in Table 1. From an examination of Table 
1 it is clear that: 

1. None of the tests correlates sig- 
nificantly with the measure of pupil- 
teacher rapport. 

2. None of the 63 correlations between 
the test variables and the classroom be- 
havior variables is significant except that 
between the Teacher Initiative score on 
the DaTt and the OScAR dimension, 
Freedom of Movement. 

3. The only classroom behavior variable 
that correlates significantly with the Halo 


*This is not strictly speaking the case, 
since complete test data were not available 
for every one of the 49 subjects. See foot- 
note a on Table 1. 
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* score of My Class is the OScAR dimen- 
sion Manifest Teacher Hostility. The cor- 
Telation is in the “expected” direction, 
tey the pupils’ liking for their teacher as 
indicated on My Class decreases with thè 
amount of manifest teacher hostility re- 
corded by observers. Evidently the more 
hostility a teacher displays in the class- 
room, the less esteemed she is by her 
pupils. 

A multiple regression analysis was em- 
ployed using the pupil-teacher rapport 
criterion with all of the test variables ex- 
cept the MTAI-Rational Key score as 
Independent variables. When weighted 
optimally with the partial regression coef- 
ficients, the eight test scores correlated 
496 with Halo. This multiple correlation 
Coefficient is not significant. 


Discussion 


The major finding of this study is the 
failure of the tests, singly or in combina- 
tion with one another, to predict subse- 
quent pupil-teacher rapport as measured 
by the Halo scale, Each of the tests was 
Selected for study because theory or past 
Tesearch, and sometimes both, encouraged 
its use as a potential predictor. The fact 
that none of the tests adequately func- 
tioned to predict pupil-teacher rapport is 
therefore of particular interest. 

One of the distinguishing features of 
this study is that the tests were ad- 
ministered to a group of college seniors 


> who had not yet served as teachers. Since 


ats tests and criterion were well separated 
m time, the study deals with the pre- 
dictive, rather than concurrent, validities 
of the tests employed. In general, the re- 
Sults offer no evidence of the predictive 
Validity of any of the tests for the par- 
ticular criterion measure studied. The tests 
Not only failed to predict rapport, they 

d not correlate with the objective meas- 
ures of behavior in the classroom. Of the 
63 correlations between test variables and 
classroom behavior variables, only the re- 


lationship between the Teacher Initiative 
dimension of the DaTt and the Freedom 
of Movement dimension of OScAR proved 
significant. 

The fact that a test has concurrent 
validity is often incorrectly used to sup- 
port a recommendation for its use as a 
predictive measure. Thus, the MTAI has 
been shown to correlate with various in- 
dependent measures of pupil-teacher rap- 
port (3), including measures based on 
pupil responses to questions such as were 
used in My Class. The well-established 
concurrent validity of the MTAI does 
not, however, demonstrate that the test 
is of predictive value, and the recommen- 
dation contained in the test manual that 
the inventory be used as a predictor is ac- 
cordingly without empirical support. The 
evidence of the present investigation, 
which is the only published research of 
which the writers are aware involving the 
correlation of the MTAI and a subse- 
quent measure of pupil-teacher rapport, 
would argue strongly against its use as a 
predictive instrument. 

It may be important to note that the 
pupil-teacher rapport criterion used in 
this study was not so uniquely or com- 
pletely determined by the personalities 
of the pupils who responded to My Class 
as to be unrelated to measurable, be- 
havioral variables in the teacher. As Table 
1 indicates, one of the measures of the 
teachers’ classroom behavior, Manifest 
Teacher Hostility, correlates significantly 
with the criterion. Moreover, in a pre- 
viously reported investigation, Medley 
and Williams (7) found that the Halo 
scores of the 49 teachers in this study 
correlated significantly (r = +.34) with 
their scores on a concurrent test measure 
of hostility.” Since the criterion used in 


2 Tt is of interest to note that the Hostility 
scale was built by selecting 50 items from 
among those on the Minnesota Multiphasie 
Personality Inventory that were found to 
discriminate significantly between teachers 
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this study is correlated with a concurrent 
measure of the teachers’ classroom be- 
havior and test behavior, the failure of 
the predictive instruments cannot be at- 
tributed to inherent unpredictability of 
the criterion. 

In the past, demonstrations of the con- 
current validity of tests have, too often, 
been uncritically accepted as evidence of 
their predictive value. The study reported 
here, however, adds support to the grow- 
ing view that the predictive value of tests 
can only be established through predic- 
tive studies. 


SumMMary 


A large group of student teachers were 
given a number of personality and at- 
titude tests during their senior year in 
college. Observations were conducted ap- 
proximately one year later in the rooms 
of 49 of these subjects who were employed 
as elementary school teachers. A measure 
of pupil-teacher tapport based on pupil 
Tesponses to questions about their teacher 
and their class was also obtained. 

In general, none of the test measures 
correlated significantly with pupil-teacher 
rapport as measured. Only one of the 63 
correlations between the test measures 
and classroom behavior measures proved 
significant. Manifest. Teacher Hostility, a 
measure based on classroom observation of 
the teacher correlated significantly with 
rapport. 

The implications of these results for 
the prediction of pupil-teacher rapport 
were discussed. 


scoring high and low on the MTAT (4). In 
view of the manner in which the Hostility 
scale was developed, it is difficult to deter- 
mine why it should correlate significantly 
with the Halo scale while the MTAT does 
not. Only when the temporal relations of 
the Hostility scale and the MTAI to the 
criterion are fully appreciated does the dif- 


ference in the validity coefficients become 
understandable. 
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When the so-called readability formulas 
are used only as rough estimating devices 
for the encouragement of popular writing, 
statistical precision is not vitally impor- 
tant. But if they are to be considered re- 
search tools in studies of comprehension or 
learning, it becomes very important to 
build into them as much precision as pos- 
sible, 

Current readability formulas offer at 
least two opportunities for reexamination 
for the sake of greater precision. First, 
many are based on reading comprehension 
tests published in 1926 and drawn from 
empirical testing of school pupils prior 
to that date. Thus they may not ade- 
quately reflect changes in either the lan- 
guage or the population of the present 
decade. Second, the “ratings” produced by 
present tests are not accompanied by a 
standard error figure, and hence tell 
nothing about significance of estimates and 
differences, 

The revision of the set of graded test 
Passages used in building two widely used 
readability formulas—the Flesch Reading 
Ease Formula (3) and the Dale-Chall 
readability formula (1)—has offered an 
Opportunity for revision of the formulas 
and also for further comparative evalua- 
tion of these two indexes of comprehen- 
Sion difficulty. The two formulas were 
Originally calculated, following Lorge (6), 

Y making measurements of sentence 
length and vocabulary difficulty in the 
1926 edition of the McCall-Crabbs Graded 
Test Lessons in Reading (7). Both for- 
mulas make use of the same sentence 
length measure—average number of words 
Per sentence. For a vocabulary measure- 
Ment, Flesch uses the number of syllables 
per 100 words, while Dale and Chall count 
the number of words that do not appear 
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on a list of 3,000 words which had proved 
“familiar” for youngsters tested in the 
fourth grade of public schools. 

Results with the two formulas are not 
directly comparable for several reasons. 
Scores are given in different terms—Flesch 
results on a scale of 100 (easy) to 0 
(difficult) and Dale-Chall results on a 
scale of about 3 (easy) to 14 (difficult). 
Flesch’s formula was calculated with 
grouped data, while Dale and Chall com- 
puted theirs with ungrouped data. In 
addition, Flesch made an adjustment in 
one of the formula terms after computa- 
tion, while Dale and Chall did not.* 

More recent arrivals on the readability 
scene are the Farr-Jenkins-Paterson sim- 
plification of the Flesch Reading Ease 
formula (2) and the Gunning Fog Index 
(4). The former uses a count of percentage 
of monosyllables instead of the Flesch 
syllable count, while the latter uses a count 
of polysyllables (words of more than two 
syllables). Both formulas take sentence 
length into account. Both are viewed by 
many as simplifications of the Flesch for- 
mula. 

The McCall-Crabbs tests were revised 
considerably in 1950 (8). There is evi- 
dence that the questions and passages of 
the 1926 edition were changed considerably 
in the 1950 edition. At least 60 of the 
tests in the 1950 edition are different in 
subject from those in the earlier edition. 


1 This adjustment concerned the criterion 
value used in developing the regression 
equation. Both Flesch and Dale-Chall used 
as a criterion the average school grade of 
pupils answering correctly 50% of the ques- 
tions accompanying the reading passages. 
But Flesch adjusted the formula to predict 
the grade of the pupil who could answer 
75% of the questions correctly. This changed 
the regression formula constant. 
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TABLE 1 
AVERAGES AND ŠTANDARD DEVIATIONS OF 
MEASUREMENTS IN Two EDITIONS OF 
THE MCcCALL-CRABBS STANDARD 
Test Lessons IN READING 


1926 Edition 


1950 Edition 
Flesch Dale-Chall 

Mean average 4.9862 5.4973 5.7492 

grade of pupils |(s = 1.1068)| (s = 1.3877)| (s = 1.6565) 

answering 50% 

correctly 

(criterion) 
Average number 15.3986 16.5213 16.8037 


of words per |(s = 3.8373)| (s = 5.5509) (e = 5.3818) 
sentence 

Average syllables} 131.6131 134.2208 - 
per hundred |(s = 11.830)|(s = 13.6845) — 
words 

Average percent- 6.9413 — 
age of words |(s = 5.8200) = 
not on Dale 
list 


8.1011 
(s = 6.3056) 


Average percent- 75.1148 — = 
age monosylla- | (s = 6, 8083) = = 
bles 

Average percent- 5.7603 = = 
age polysylla- |(e = 4.5835) = -= 
bles 


Those are the passages dealing with World 
War II, atomic energy, modern aviation, 
and similar recent developments. Table 1 
shows how the two editions differed in 
averages and standard deviations of the 
various measurements. 


Purpose or Revision 

It was felt that recalculation of these 
four formulas with the 1950 tests as a 
criterion would accomplish two main pur- 
poses: (a) modernize the formulas by 
taking advantage of the more recently ad- 
ministered tests which should reflect some 
of the changes in pupil reading abilities 
between 1926 and 1950, and (b) establish 
formulas which are derived from identical 
materials, measured by identical rules of 
measurement on the common factor, cal- 
culated by identical mathematical opera- 
tions, and reported without adjustment. 
The latter goal seems desirable because it 
will make further comparative studies 
easier to perform and interpret (ie, no 


manipulations of the recalculated formulas ^ 


will be needed in future research toward 
modernization and validation). It would 
also allow averaging of several formula re- 
sults for any sample of writing, thus per- 
haps giving more accurate scores where 
extreme accuracy is needed. 


METHODS 
The following measurements were made 


in the 383 prose passages of the 1950 edi- 
tion of the McCall-Crabbs tests: 


1. Average grade score of pupils an- 
swering half the test questions correctly. 


2. Average number of words per sen- 4 


tence in each passage. 

3. Number of syllables per 100 words 
in each passage. 

4. Percentage of words in each passage 
not appearing on Dale’s list of 3,000 “easy” 
words. 

5. Percentage of monosyllables in each 
passage, 


6. Percentage of polysyllables in each wf 


passage. 
Regression formulas were computed 
with these measurements,” and the results 
of the formulas were compared by ap- 
plying them to 113 samples of writing 
from various publications to determine the 
practical significance of differences in for- 
mula results. The recalculated Flesch and 
Dale-Chall formulas were also compared 
with each other and with results from 
the original formulas? in a sample of 40 
of the McCall-Crabbs passages. Such com- 
parisons with the other recalculated for- 


*Calculation facilities used were in the 
Wisconsin Numerical Research Laboratory, 


search Committee., 

*The original formulas were as follows: 
Flesch: 206.84 — (1.015) (sent, length) — 
(.846) (syllables per 100 words). Dale-Chall: 
3.6365 + (.0496) (sent. length) + (.1597) (% 
non-Dale words). unning: 4 (sent. 
length + % poylsyllables), F-J-P : —31.517 — 
(1.015) (sent. length) + (1.599)(% mono- 
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mulas were not possible because adjust- 
ments of the original formulas were not 
Possible or because different rules for 
word counting were used in the original 
formula and the recalculation. 


REsULTS 


The calculations yielded the following 
recalculated formulas: 


Flesch: —2,2029 + (.0778) (sentence 
length) + (.0455) (syllables per 100 
words) 

Dale-Chall: 3.2672 + (.0596) (sen- 
tence length) + (.1155) (% non- 
Dale words) 

Farr-Jenkins-Paterson: 8.4335 + 
(.0923) (sentence length) — (.0648) 
(% monosyllables) 

Gunning Fog Index: 3.0680 + (.0877) 
(sentence length) + (.0984) (% 
polysyllables) 


_ The coefficients of multiple determina- 
Ron (2*)—which indicates the amount of 
Variation in difficulty among the tests 
Which is accounted for by the two style 
Variables in the formula—are 4034 for 
the recalculated Flesch formula, .5092 for 
the recalculated Dale-Chall formula, .3407 
for the Farr-Jenkins-Paterson recalcula- 
tion, and .3440 for the Gunning recalcula- 
tion. These statistics, which are corrected 
for degrees of freedom, show that the re- 
calculated Flesch formula statistically 
explains” some 40% of the variation in 
difficulty of the McCall-Crabbs tests. The 

ale-Chall formula explains almost 51%, 


lables) . The Flesch formula used for 
adj parison with the new one had to be 
Justed back to predict at the 50% level 
cha: e criterion and the scale reversed (i.e. 
su need back to the form which was pre- 
Sh en yielded directly by Flesch’s com- 
Mi ons before he made the various adjust- 
ene This unscaled, reversed formula, as 
(lord) gS We, can determine, is —7.5695 + 
~015)(sent. length) + (.0864) (syllables per 
Words). It was not possible to put the 


a beenkins-Paterson simplification on such 
Sis, 


and is thus the much more powerful tool 
for predicting reading difficulty. The Farr- 
Jenkifis-Paterson and Gunning formulas 
as recalculated are about equal in predic- 
tive power—both considerably weaker 
than the other formulas. 

The error terms for the formulas are .85 
school grades for the Flesch formula, .77 
grades for the Dale-Chall formula, and 
.90 grades for the others. Converting the 
predicted value for each formula into a 
grade level figure and following the stand- 
ard practice of taking a range of plus or 
minus two standard errors as the probable 
area in which the “true” value lies, the 
error range would be 1.71 grades for the 
Flesch formula, 1.55 grades for the Dale- 
Chall formula, and 1.80 grades for both 
the others. Thus the Dale-Chall formula 
came through the recalculations as slightly 
more precise than the others. 

Table 2 presents comparisons of various 
statistics for the Flesch formula and Dale- 
Chall formula in their recalculated and 
original forms and for the recalculated 
Farr-Jenkins-Paterson simplification and 
Gunning Index. 

To assess the practical significance of 
the revision, the original and recalculated 
forms of the Flesch and Dale-Chall for- 
mulas were applied to 47 sample passages 
from a variety of sources. The recalculated 
Dale-Chall formula consistently gave lower 
scores than the original; the average dis- 
erepancy (average absolute deviation) be- 
tween the two was .94 grades. The average 
discrepancy between the original and re- 
calculated Flesch formulas was .85 grades, 
with the recalculated formula giving a 
lower score about two-thirds of the time. 

All four recalculated formulas were com- 
pared in a sample of 113 passages from 
15 magazines. The results are given in 
Table 3. The writers feel that two observa- 
tions from the table are worthy of mention 
here. First, the average discrepancy of re- 
sults using the recalculated Flesch and 
Dale-Chall formulas was .54 school grades, 
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TABLE 2 
REGRESSION STATISTICS FOR THE RECALCULATED AND ORIGINAL FORMULAS 
F-J-P Gunni 
Flesch Formula Dale-Chall Formula Mah AA bas 
Statistic® 
Recalculated| Original |Recalculated Original | Recalculated Recalculated 
rae -2019 +2695 -2019 -2191 +2019 -2019 
T? -3436 -4420 -4759 -4670 .2526 -2665 
Tog -1363 -2157 -1599 -2607 -1055 .1265 
710.3 .0987 -1743+ -0450 -1331+ -1146 .1013 
713.9 -3117 -2202+ -3883 -1936+ — .6293 - 1984 
iaa -0778 -1015 -0596 -0496 -0923 -0877 
bins 10455 .0846 1155 -1579 — .0648 .0984 
Biza -2697 -2639 -2065 -1611 -3199 8042 
Bias 4865, 5422 .6073 6011 -3986 .4081 
a1, 23 —2.2029 | —7.5695 3.2790 3.6365 8.4335 3.0680 
R123 -4034 -4966 -5092 -4900 -3407 .3440 


Note.—+ Values computed from those given by Flesch or Dale and Chall by the relationship: 


tik = Va 


2 Subscripts refer to (1) the criterion, 
index, 


In the comparison of the original Dale- 
Chall and Flesch formulas above, the aver- 
age discrepancy was .87 grades. All four 
recalculated formulas agreed much more 
closely with one another than the original 
Dale-Chall and Flesch formulas did. This 
would seem to be a point in favor of the 
recalculations. 


TABLE 3 
Comparisons BETWEEN RESULTS WITH 
RECALCULATED FORMULAS APPLIED TO 
113 100-worp SAMPLES or Prosp 


Positive | Negative 

deviations |deviations Average 

Comparison pee 
size Ceme size ga tion 
Dale-Chall and Flesch | .51| (58) | .57 (55) | 54 
Flesch and Gunning -54| (82) | .15| (31) 44 
Dale-Chall and -65| (82) | .31| (29) -56 

Gunning 

Flesch and F-J-P -56| (96) | .13| (16) -50 
Dale-Chall and F-J-P | .73 (93) | .35) (20) -66 
Gunning and F-J-P -36| (70) | .57| (42) 54 


Tij — Tiktik 


e 
= rD — 73) 


(2) average sentence length. Subscript (3) refers to a different variable for 
each formula: syllable per 100 words for Flesch formula, percenta, 


percentage monosyllables for the Farr-Jenkins-Paterson form 


ge words not on Dale List for the Dale-Chall formula, 
ula, and percentage polysyllables for the Gunning 


The recalculated Gunning Index gave 
results that were in slightly higher agree- 
ment with the results of the recalculated 
Flesch formula than were the results with 
the recalculated Farr-Jenkins-Paterson 
simplification. The average absolute devia- 
tion between the recalculated Flesch for- 
mula and Gunning Index was 44 grades, 
with 73% of the predictions lower than 
those of the Flesch formula. The average 
absolute deviation between the recaleu- 
lated Flesch formula and the recalculated 
Farr-Jenkins-Paterson simplification was 
-50 grades, with 85% of the results with the 
simplification being lower than results with 
the Flesch formula. Thus the two simplifi- 
cations gave slightly lower scores than 
the recalculated Flesch formula, Scores 
with the recalculated Gunning Index were 
slightly closer to the Flesch results, and 
there were more instances of predictions 
which were higher than the Flesch predic- 
tions than was true of the Farr-Jenkins- 
Paterson formula as recalculated. 


E 


- 
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A TABLE 4 
Norms or RECALCULATED FORMULA Scores FOR MATERIAL or VARIOUS TYPES 
Farr- 
Material Style Flesch |Dale-Chall| Gunning | Jenkins- 
Paterson 
Scientific: d 
Phytopathology, Soil | Difficult | Means... 8.00 8.50} 7.70) 7.20 
Science, Journal of Ranges. .|7.10-9.50)7.10-10.70/6.70-8.90/6.30-7.40 
Nutrition, Science, 
American Journal of 
Veterinary Research 

cademic: 

Yale Review, Harvard | Difficult | Means... 7.90 8.40 7.60 7.00 
Educational Review, Ranges. .|7.00-8.70) 7.50-8.60)/7.10-8 .60/6.50-8.00 
r % Annuals of the Ameri- 

can Academy of Politi- 
cal and Social Sciences. 
ee 6.40| 6.50 
arper’s, Atlantic Fairl. Means... 6.80) 6.70 „4 i 
Monthly Dit- Ranges. .|6.30-7.60) 5.50-8.60/5.40-8.50/5.40-6.80 
cult 
sheen 70} 5.90 
Na Reader's Di e | Means... 6.00) 6.10) 5. . 
atic eas Ranges. ./5.00-7.20) 4.80-7.00)5.00-6.70)4.90-6.70 
ition: gof 4.90) 4.90) 4.90 
J4 olliers, Ladies H Fairly Means... 4. . : : 
` Journal, Good House. | Easy | Ranges. .[4.30-6.20| 4.50-6.70/4.40-6.20|4.40-6.50 
keeping 
Pulp Fiction: 4.30 4.20 4.30 4.20 
rue Qi i D) Means... . . 7 f 
ei onfisSone Easy | Ranges. [3.70-4.40] 3.70-4.50/3.80-4.8013.70-4..50 


The application of the formulas to the 

Passages also provides some “norms” 

for interpreting scores which they yield. 

© Passages came from various types of 

Publications, presumably representing gen- 

Mn erally different levels of reading difficulty 
aS noted in Table 4. 

Use of such a scale is admittedly a rough 
manner of interpreting readability scores, 
and the scale in Table 4 was not formed in 

© most exact manner; although sampling 
Was at random within issues, the issues 
Were not randomly chosen and calcula- 
tions were rounded. However, this gen- 
eral approach seems more desirable than 
“sing the theoretical formula result (grade 
leve 1) or making adjustments in the theo- 
retical result without benefit of extended 
testing. Further details on background, 


é 


a 
7> 


method, and results of this work are avail- 
able on microfilm (9). 


CONCLUSIONS AND DISCUSSION 


To recommend use of the four recalcu- 
lated formulas in preference to the origi- 
nal ones is a rather drastic step, in view 
of the wide use the original formulas have 
enjoyed. However, such a recommendation 
is made here for the reasons we set forth 
in the paragraphs on the purpose of the 
recalculation. 

The formula coefficients derived in the 
recalculations on the 1950 McCall-Crabbs 
tests have the same statistical validity as 
those calculated on the 1926 edition of the 
tests. They are statistically preferable to 
those formed by rougher, short-cut pro- 
cedures. 
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Reservations in making a recommenda- 
tion to use the recalculated Dale-Chall and 
Flesch formulas stem from two “basic 
sources: (a) Readability formulas are such 
rough estimates at best that to say one 
result is better than another is statistically 
hazardous—especially when the nature of 
the material on which the formulas are to 
be used differs from that of the material 
used in computing the formula. (b) In 
the revision of the McCall-Crabbs criterion 
tests, passages of higher difficulty were 
omitted. The style measurements of these 
passages and the educational level of pu- 
pils taking these more difficult passages 
might have been of a type which more 
nearly approaches the type of writing and 
audience for which the formulas are nor- 
mally used. In other words, restriction of 
the range of difficulty in the 1950 tests may 
have made this edition less suitable than 
the 1926 tests for building readability for- 
mulas. But to the extent this argument 
is sound, all linear formulas suffer equally 
from the curvilinearity it implies. 

It is further recommended that the 
Dale-Chall formula be used whenever pos- 
sible in the absence of specific reasons for 
preferring the Flesch formula or one of its 
simplifications. The Dale-Chall formula 
was best in terms of small error and high 
prediction power. This parallels an earlier 
judgment by Klare (5) that the original 
Dale-Chall formula was better than the 
original Flesch formula by a slight margin. 

The statements here as to error and pre- 
diction power of the formulas apply only 
to prediction and precision in regard to 
the criterion passages. They do not un- 
equivocally hold true for the formulas as 
they are normally used—for estimating 
difficulty of adult reading materials. It is 
possible that a formula with low precision 
or predictive power in this research could 
be fully as precise as the others for pre- 
dicting adult reading difficulty. But there 
is no direct evidence that this would be SO, 
and the only recourse at present seems to 
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be to give the Dale-Chall formula the 
highest place on the basis of its prediction 
power and small error computed on the 
criterion. 

Some formula-users—particularly those 
who use formulas only occasionally—are 
understandingly reluctant about referring 
to a word list, which is required by the 
Dale-Chall formula. Of popular formulas 
without word lists, the Flesch formula is 
Statistically best. 

Those who use either simplification of 
the Flesch formula should recognize that 
they are sacrificing precision and accuracy 
by doing so. But it seems evident that for 
estimates of readability which need to be 
performed rapidly and where precision is 
not extremely important, either simplifica- 
tion will do the job. 

There are two ways of looking at pre- 
cision in a readability formula. One way 
is to admit that formulas are rough esti- 
mates at best, and that a loss of a little 
precision is not important. The other is 
to argue that since the formulas give only 
rough estimates, it is important to keep 
whatever precision and prediction power 
exists. 

The choice of viewpoint seems to hinge 
on the use to be made of formula results. 
A news writer or editor who uses a formula 
“to see how we are doing” could probably 
regard all four formulas as equal for his 
purpose and use whichever formula he 
found easiest to apply. If readability scores 
are part of a research design, however, the 
social scientist will want to choose the most 
powerful and precise formula even though 
it entails more difficulties in application. 
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NOTE ON THE ORGANISMIC AGE CONCEPT 


PAUL BLOMMERS AND J. B. STROUD 
State University of Iowa 


In the March 1955 issue of this journal 
the authors, together with Knief, presented 
data showing that the use of height age, 
weight age, and dental age contributed 
practically nothing to a least squares 
estimate of either reading or arithmetic 
achievement when combined with mental 
age (1). That is to say, mental age alone 
provided about as accurate an estimate 
of achievement in these areas as a least 
squares combination of mental age and 
these three other age scores. 

It is well-known that for the model 
assumed (usually a first degree poly- 
nomial) a least squares combination of 
Scores provides composite scores for the 
individuals at hand which bear a maxi- 
mum degree of relationship to the cri- 
terion. Hence, multiple correlations be- 
tween achievement and the component 
variables that enter into organismic age 
(OA) cannot be lower than the simple 
correlations between achievement and 
mental age (MA) alone. In such multiple 
correlation analyses the component age 
Scores are, of course, automatically ideally 
weighted before being combined. In the 
formation of the OA score, on the other 
hand, the component age scores are given 
equal weights’ since the OA score is the 
simple unweighted average of the com- 
ponent age scores. Because of the nature 
of the relationships among these age scores 
it follows that the correlation between 
educational achievement and OA must 
necessarily be considerably lower than that 
between achievement and MA alone. The 
purpose of this note is to demonstrate 


*By a weight we mean the constant by 
which the score for a trait is multiplied be- 
fore it is combined with other similarly 
weighted scores to form a composite. 


this fact analytically. We shall also use 
data reported in our previous article to 
illustrate the extent of this attenuating 
effect of anatomical and physiological age 
scores when used in combination with MA 
to predict school achievement, 

In the following discussion and in keep- 
ing with our previous article, we shall 
consider as estimators only mental, height, 
weight, and dental age scores. We shall 
designate these as M, H, W, and D, re- 
spectively. Reading and arithmetic scores 
will be used as measures of school achieve- 
ment and will be designated R and A, 
respectively. Since the correlation between 
the sum of the four age scores and either 
R or A is identical with the correlation 
between the mean of these four scores 
and R or A, we shall discuss the efficacy 
of OA as a predictor of R or A where 
OA=O=M+H+W 4D. The 
symbol Cov RO will be used to refer to 
the covariance between R and O scores 
while the symbol Var O will be used to 
refer to the variance of the O scores. 

Consider reading (R) as the criterion. 
Then 


Cov RO = CovR(M+H+W+ D) 
= Cov RM + Cov RH u 
+ Cov RW + Cov RD. 


But the various anatomical and physio- 
logical scores tend to bear a very low de- 
gree of relationship to reading achieve- 
ment so that Cov RM is large in relation 
to Cov RH + Cov RW + Cov RD. That 
is, the addition of H, W, and D to M does 
not supplement Cov RM to any marked 
extent, so that Cov RM accounts for most 
of Coy RO. 
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A Cov RO 
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is sœ Next note that 


Var O = Var(M + H + W + D) 


Var M + Var H + Var W 


i] 


+ Var D + 2 Cov MH A 
+ 2 Cov MW + 2 Cov MD 

+ 2 Cov HW + 2 Cov HD 

+ 2 Cov WD. 


It 1s clear from [2] that even if the 
Covariances involving M with H, W, and 
D are small, the variance of M is a rela- 
vale small portion of the variance of O. 
Now the correlation between R and M is 
given by 


Cov RM 
vV (Var R)(Var M)’ 


while the correlation between R and O is 
§lven by 


[B] 


— 4 
V (Var R)(Var 0) t 


[3 rae We have indicated, the numerator of 
] differs little from that of [4], while 
tha denominator of [4] is much greater 
an that of [3] due to the fact that Var 
must necessarily be much greater than 
eat M. Hence it is a simple mathematical 
ii, that the correlation between R and O 

Ist be less than that between R and M. 

» Of course, follows in general that 


k, 
SS Mental age alone is a much more useful 
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predictor of school achievement than is 
mental age in equally weighted combina- 
tion wjth various anatomical and physio- 
logical age scores, that is, OA. Moreover, 
the more physiological and anatomical 
age scores used in determining OA, the 
greater the attenuation of the correlation 
between OA and a school achievement 
criterion. 

To show how marked this attenuation 
actually becomes, Formulas [1], [2], and 
[4] were applied to the products matrix 
used in the least squares analysis reported 
in our previous paper. In this paper the 
correlation between R and M was reported 
as .645. When H, W, and D are added to 
M to form O, the correlation of R with 
O is only .24. With arithmetic achieve- 
ment as the criterion, the correlation be- 
tween A and M previously reported was 
(551. In this case when H, W, and D are 
added to M to form O, the correlation of 
A with O is .21. 

In brief, there are neither theoretical 
nor empirical bases for believing that 
organismic age predicts school achieve- 
ment. This is not to say that OA may 
not be useful in predicting other types of 
behavior. However, evidence of such use- 
fulness is not as yet generally available. 
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TEACHER COMMENTS AND STUDENT PERFORMANCE: A 
SEVENTY-FOUR CLASSROOM EXPERIMENT IN 
SCHOOL MOTIVATION: 


ELLIS BATTEN PAGE 
University of California, Los Angeles* 


Each year teachers spend millions of 
hours marking and writing comments upon 
Papers being returned to students, ap- 
parently in the belief that their words 
will produce some result, in student per- 
formance, superior to that obtained with- 
out such words. Yet on this point solid 
\ | experimental evidence, obtained under 

genuine classroom conditions, has been 
“conspicuously absent. Consequently each 
_ eacher is free to do as he likes; one will 
omment copiously, another not at all. 
sind each believes himself to be right. 

The present experiment investigated the 
questions: 1. Do teacher comments cause 

_ 4 significant improvement in student per- 
formance? 2. If comments have an effect, 
_ Which comments have more than others, 
and what are the conditions, in students 


i 


1 è 

Portions of this paper were read at the 

— Research Conference of the Ameri- 

\ Fra ducational Research Association at San 

sete March 8, 1958. This research de- 

A sons es upon cooperation from many, per- 

‘of th Space limitations prevent the listing 

debt eir names, The writer is especially in- 

Pi ed to the teachers who freely donated 

me ond -nanei after having been randomly 

he staat ‘at, their participation the 

aroy! obviously would have been impos- 

ible, ge 
2 

of Where this study was conceived as part 

are doctoral dissertation. The study was 

o ucted in the San Diego City and 

Rae Schools while the writer was with 

a Diego Junior College. He is presently 
Coordinator of Guidance, Estera Michigan 
ollege. Burete Edn 
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and class, conducive to such effect? The 
questions are obviously important for 
secondary education, educational psy- 
chology, learning theory, and the pressing 
concern of how a teacher can most ef- 
fectively spend his time. 


Previous RELATED WORK 


Previous investigations of “praise” and 
“blame,” however fruitful for the general 
psychologist, have for the educator been 
encumbered by certain weaknesses: Treat- 
ments have been administered by persons 
who were extraneous to the normal class 
situation. Tests have been of a contrived 
nature in order to keep students (un- 
realistically) ignorant of the true com- 
parative quality of their work. Comments 
of praise or blame have been administered 
on a random basis, unlike the classroom 
where their administration is not at all 
random. Subjects have often lacked any 
independent measures of their perform- 
ance, unlike students in the classroom. 
Areas of training have often been those 
considered so fresh that the students would 
have little previous history of related suc- 
cess or failure, an assumption impossible 
to make in the classroom. There have 
furthermore been certain statistical errors: 
tests of significance have been conducted 
as if students were totally independent of 
one another, when in truth they were 


i of a small number 
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of groups with, very probably, some group 
effects upon the experimental outcome. 

For the educator such experimental 
deviations from ordinary classroom condi- 
tions have some grave implications, ex- 
plored elsewhere by the present writer (5). 
Where the conditions are highly con- 
trived, no matter how tight the controls, 
efforts to apply the findings to the or- 
dinary teacher-pupil relationship are at 
best rather tenuous. This study was there- 
fore intended to fill both a psychological 
and methodological lack by leaving the 
total classroom procedures exactly what 
they would have been without the experi- 
ment, except for the written comments 
themselves. 


METHOD 


Assigning the subjects. Seventy-four 
teachers, randomly selected from among 
the secondary teachers of three districts, 
followed detailed printed instructions in 
conducting the experiment. By random 
procedures each teacher chose one class to 
be subject from among his available 
classes? As one might expect, these classes 
represented about equally all secondary 
grades from seventh through twelfth, and 
most of the secondary subject-matter 
fields. They contained 2,139 individual 
students. 

First the teacher administered whatever 
objective test would ordinarily come next 
in his course of study; it might be 
arithmetic, spelling, civies, or whatever. 
He collected and marked these tests in 
his usual way, so that each paper ex- 
hibited a numerical score and, on the 
basis of the score, the appropriate letter 
grade A, B, C, D, or F, each teacher 
following his usual policy of grade dis- 
tribution. Next, the teacher placed the 
papers in numerical rank order, with the 
best paper on top. He rolled a specially 

*Certain classes, like certain teachers, 


would be ineligible for a priori reasons: 
giving no objective tests, ete. 
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marked die to assign the top paper to the 
No Comment, Free Comment, or Specified 
Comment group. He rolled again, assign- 
ing the second-best paper to one of the 
two remaining groups. He automatically 
assigned the third-best paper to the one 
treatment group remaining. He then re- 
peated the process of rolling and assigning 
with the next three papers in the class, and 
so on until all students were assigned. 
Administering treatments. The teacher 
returned all test papers with the numeri- 
cal score and letter grade, as earned. No 
Comment students received nothing else. 
Free Comment students received, in addi- 
tion, whatever comment the teacher 
might feel it desirable to make. Teachers 
were instructed: “Write anything that oc- 
curs to you in the circumstances. There 


y 


A: 


is not any ‘right’ or ‘wrong’ comment for | 


this study. A comment is ‘right’ for the 


study if it conforms with your own feel- 
ings and practices.” Specified Comment 


students, regardless of teacher or student 
differences, all received comments desig- 
nated in advance for each letter grade, as 
follows: 

A: Excellent! Keep it up. 

B: Good work. Keep at it. 

C: Perhaps try to do still better? 

D: Let’s bring this up. 

F: Let’s raise this grade! 
Teachers were instructed to administer 
the comments “rapidly and automatically, 
trying not even to notice who the students 
are.” This instruction was to prevent any 
extra attention to the Specified Comment 
students, in class or out, which might con- 


found the experimental results. After the 


comments were written on each paper an 
recorded on the special sheet for the 
experimenter, the test papers were Te- 
turned to the students in the teacher's 
customary way. 

It is interesting to note that the stu- 
dent subjects were totally naive. In other 
psychological experiments, while often not 
aware of precisely what is being tested, 


7 
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subjects are almost always sure that 
something unusual is underway. In 69 of 
the present classes there was no discussion 
by teacher or student of the comments be- 
ing returned. In the remaining five the 
en ordinary brief instructions 
© “notice comments” and “profit by 
them,” or similar remarks. In none of the 
classes were students reported to seem 
aware or suspicious that they were ex- 
perimental subjects. 
r a Comment effects were judged 
ak the scores achieved on the very next 
ese test given in the class, regardless 
tate nature of that test. Since the 74 
as F TEE would naturally differ 
eN rom each other in subject matter, 
variabi difficulty, and every other testing 
Kira e, they obviously presented some 
e unusual problems. When the tests 
bl. pres primarily as ranking in- 
ie Ge however, some of the diffi- 
disappeared. 
a F with 30 useful students, for €x- 
of ar ormed just 10 levels on the basis 
Cona from the first test. Each level 
a of three students, with each 
Noc receiving a different treatment: 
c hee Free Comment, or Specified 
ee ent. Students then achieved new 
might a the second (criterion) test, as 
Ona 5 illustrated in Table 1, Part A. 
veg asis of such scores, they were as- 
trat Tankings within levels, as illus- 
i in Table 1, Part B. 
ee i comments had no effects, „the 
exce A aa of Part B would not differ 
fis E y chance, and the two-way analy- 
de ee. oe by ranks would be used to 
ine whether such differences °x- 


c 
ceded chance Then the sums of ranks 
4 


Th 
formula, present study employed a rew 


ge = OE 
z0 


hi , 
ih represents a simplification of Fried- 
twenty-year-old notation (2). The 
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TABLE 1 
ILLUSTRATION OF RANKED Data 
° 
Part A Part B 
Level (Raw scores on | (Ranks-within-levels 
second test) on second test) 
N F S N F S 
1 33 | 31 | 34 2 $ 3 
2 30 | 25 | 32 2 1 3 
3 29 | 33 | 23 2 3 1 
10 14 | 25 | 21 1 3 2 
Sum: 19 | 21 | 20 


Note.—N is No Comment; F is Free Comment; S 
is Specified Comment. 


themselves could be ranked. (In Part B 
the rankings would be 1, 3, and 2 for 
Groups N, F, and S; the highest score is 
ranked 3 throughout the study.) And a 
new test, of the same type, could be made 
of all such rankings from the 74 experi- 
mental classrooms. Such a test was for the 
present design the better alternative, since 
it allowed for the likelihood of “Type G 
errors” (3, pp- 9-10) in the experimental 
outcome. Still a third way remained to 
use these rankings. The summation of each 
column could be divided by the number 
of levels in the class, and the result was 


new form is the classic chi square, 
= (0 — E} 
E 


multiplied by 6/k where k is simply the 
number of ranks! This conversion was dis- 
covered in connection with the present study 
by a collaboration of the writer with Alan 
Waterman and David Wiley. Proof that it is 
identical with the earlier and more cumber- 


some variation, 
12 
w= 
Nk(k + 1) 
will be included in a future statistical 
article. 


x(R)? — 3N (k + 1), 
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a mean rank within treatment within 
class. This score proved very useful, since 
it fulfilled certain requirements for para- 
metric data. 


RESULTS 


Comment vs. no comment. The over-all 
significance of the comment effects, as 
measured by the analysis of variance by 
ranks, is indicated in Table 2. The first 
row shows results obtained when students 
were considered as matched independently 
from one common population. The second 
row shows results when treatment groups 
within classes were regarded as intact 
groups. In either case the conclusions were 
the same. The Specified Comment group, 
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which received automatic impersonal com- 
ments according to the letter grade re- 
ceived, achieved higher scores than the 
No Comment group. The Free Comment 
group, which received individualized com- 
ments from the teachers, achieved the 
highest scores of all. Not once in a hundred 
times would such differences have occurred 
by chance if scores were drawn from a 
common population. Therefore it may be 
held that the comments had a real and 
beneficial effect upon the students’ mas- 
tery of subject matter in the various ex- 
perimental classes. 

It was also possible, as indicated earlier, 
to use the mean ranks within treatments 


TABLE 2 
Tue FRIEDMAN TEST OF THE OVER-ALL TREATMENT EFFECTS 
Units considered N F S df x? $ 
Individual Subjects 1363 1488 1427 2 10.9593 < .01 
Class-group Subjects 129.5 170.0 144.5 2 11.3310 < .01 
TABLE 3 
Parametric Dara BASED Uron Meran Ranks WITHIN TREATMENTS WirHiIn CLASSES 
Source N F S Total 
Number of Groups 74 
74 74 222 
Sum of Mean Ranks 140.99 154.42 148.59 444.00 
Sum of Squares of Mean Ranks 273.50 327.50 304.01 905.01 
Mean of Mean Ranks 1.905 2.087 2.008 2.000 
5.D. of Mean Ranks 259 265 276 
S.E. of Mean Ranks -030 -031 082 
TABLE 4 
ANALYSIS OF VARIANCE oF MAIN TREATMENT EFFECTS 
(Based on Mean Ranks) 
Source Sum of M Proba- 
Squares 4% Squat ca bility 
Between Treatments: N, F, S 
Between Class-groups a se poe ac 
Interaction: T X Class 15.78 146 ae 
Total 17.01 221 
Note. 


—Modeled after Lindquist (3), p. 157 et passim, except for unusual conditions noted. 
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within classes as parametric scores. The 
Tesulting distributions, being normally dis- 
tributed and fulfilling certain other as- 
Sumptions underlying parametric tests, 
Permitted other important comparisons 
to be made Table 3 shows the mean- 
Tanks data necessary for such compari- 
sons. 
me various tests are summarized in 
TS # and 5. The over-all F test in 
the = 5 duplicated, as one would expect, 
fern esult of the Friedman test, with dif- 
a i between treatment groups still 
pa orm beyond the .01 level. Compari- 
menti etween different pairs of treat- 
Sen are shown in Table 5. All differ- 
Fres Sty significant except that between 
Te omment and Specified Comment. 
the as plain that comments, especially 
Park evades comments, had a 
an effect upon student performance. 
ton on and schools. One might ques- 
Ero if ether comment effects would vary 
m school to school, and even whether 
ee might not be the more appro- 
pened ey of analysis. Since as it hap- 
igh the study had 12 junior or senior 
mes which had three or more 
te classes, these schools were 
aes in a treatments-by-replications 
i eu of the analysis are shown 
scien 6. Schools apparently had little 
“surable influence over treatment effect. 
Ee arenis and school years. It was 
age Ele that students, with increasing 
lee grade-placement, might become 
®asingly independent of comments and 


Pik 


i 


Vari. May be noted that the analysis of 
ie based upon such mean ranks will 
tire no calculation of sums of squares 
e en levels or between classes. This is 
e (k pause the mean for any class 3 
2.00 + 1)/2, or in the present study just 
e the An alternative to such scores would 
based Caversion of all scores to T scores 
but thi upon each class-group’s distribution ; 
Sensitiy, mean ranks, while very slightly less 
therefe? Te much simpler to compute and 
Ore less subject to error. 
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TABLE 5 


DIFFERENCES BETWEEN MEANS OF THE 
TREATMENT GROUPS 


Comparison | Difer- SE. off 4 | Probability 

Between N | .182 | .052 |3.500| <.001 
and F 

Between N | .103 | .054 |1.907| <.05 
and § 

Between F | .079 | .056 |1.411| <.10(n.s.) 
and S 


Note.—The ¢ tests presented are those for matched 
pairs, consisting of the paired mean ranks of the treat- 
ment groups within the different classes. Probabilities 
quoted assume that one-tailed tests were appropriate. 


other personal attentions from their teach- 
ers. To test such a belief, 66 class-groups, 
drawn from the experimental classes, were 
stratified into six school years (Grades 
7-12) with 11 class-groups in each school 
year. Still using mean ranks as data, sum- 
mations of such scores were as shown in 
Table 7. Rather surprisingly, no uniform 
trend was apparent. When the data were 
tested for interaction of school year and 
comment effect (see Table 8), school year 
did not exhibit a significant influence upon 
comment effect. 

Though Table 8 represents a compre- 
hensive test of school-year effect, it was 
not supported by all available evidence. 
Certain other, more limited tests did 
show significant differences in school year, 
with possibly greater responsiveness in 
higher grades. The relevant data (6, chap. 
5) are too cumbersome for the present 
report, and must be interpreted with cau- 
tion. Apparently, however, comments do 
not lose effectiveness as students move 
through school. Rather they appear fairly 
important, especially when individualized, 
at all secondary levels. 

One must remember that, between the 
present class-groupings, there were many 
differences other than school year alone. 
Other teachers, other subject-matter fields, 
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TABLE 6 
THE INFLUENCE OF THE SCHOOL Upon THE TREATMENT EFFECT 
Sum of Mean Proba- 
Source Squares af Square bility 
Between Treatments: N, F, S .172 2 -086 sni 
Between Schools -000 WL -000 
Between Classes Within Schools (pooled) -000 24 000 
Interaction: T X Schools 1.937 22 088 
Interaction: T X Cl. W. Sch. (pooled) 4.781 48 -099 
Total 6.890 107 


Note.—Modified for mean-rank data from Edwards (1, p. 295 et passim). 


2 Absence of an important main treatment effect is probably caused by necessary restriction of sample for school 
year (N is 36, as compared with Total N of 74), and by some chance biasing. 


TABLE 7 


Sums or Mean Ranks ror DIFFERENT 
ScHooL YEARS 


School Year N F S 
12 21.08 22.92 22.00 
11 19.06 23.91 23.03 
10 20.08 23.32 22.60 
9 22.34 22.06 21.60 
8 21.21 22.39 22.40 
7 22.04 22.98 20.98 


Note.—Number of groups is 11 in each cell. 


other class conditions could conceivably 
have been correlated beyond chance with 
school year. Such correlations would in 
some cases, possibly, tend to modify the 
visible school-year influence, so that illu- 
sions would be created. However possible, 
such a caution, at present, appears rather 


TAB 


empty. In absence of contradictory evi- 
dence, it would seem reasonable to eX- 
trapolate the importance of comment to 
other years outside the secondary range. 
One might predict that comments would 
appear equally important if tested under 
comparable conditions in the early college 
years. Such a suggestion, in view of the 
large lecture halls and detached pro- 
fessors of higher education, would appear 
one of the more striking experimental re- 
sults. 

Comments and letter grades. In a ques- 
tionnaire made out before the experiment, 
each teacher rated each student in bis 
class with a number from 1 to 5, according 
to the student’s guessed responsiveness to 
comments made by that teacher. Top 
rating, for example, was paired with the 


THE INFLUENCE or SCHOOL Year Upon TREATMENT EFFECT 


Source 


Between Treatments: N, F, S 
Between School Years 

Between Cl. Within Sch. Yr. (pooled) 
Interaction: T X School Year 
Interaction: T X Class (pooled) 


Total 


LE 8 

Sum of Mean Proba- 
Squares af Square F bility 
1.06 2 .530 5.25  <.01 
0.00 5 -000 

0.00 60 -000 

113 10 003 112 (n 8+) 
12.11 120 -101 

14.30 197 


Note.—Modified for mean-rank data from Edwards a 


+ P. 295 et passim). 
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description: “Seems to respond quite un- 
usually well to suggestions or comments 
made by the teacher of this class. Is 
quite apt to be influenced by praise, cor- 
rection, ete.” Bottom rating, on the other 
iona, implied: “Seems rather negativistic 
= Ag ae made by the teacher. 
i ee inclined more than most students 
Ses k r opposite from what the teacher 
aoe n daily practice, many teachers 
ates ent on some papers and not on 
bs os Since teachers would presumably 
(ge likely to comment on papers of 
isan i students they believed would re- 
eal Positively, such ratings were an 
WE ant experimental variable. 
ee “teed teachers were able to predict 
aie eae is a complicated question, 
owas e reported here. It was thought, 
baliete p that teachers might tend _to 
n heir able students, their high 
dents Ne were also their responsive stu- 
tais F contingency table was therefore 
guessed esting the relationship between 
FN PEE GER and letter grade 
a red on the first test. The results were 
SE i pie More “A” students were 
ita nie as highly responsive to comments 
tition: other letter grades; more oe 
Ans S Were regarded as negativistic and 
Sther lets sive to comments than were 
followed ter grades; and grades in between 
coefici the same trend. The over-all C 
001 a fp 36, significant beyond the 
Hat he el.” Plainly teachers believed that 
respon etter students were also their more 
swe students. 
di aiten were correct in their belief, 
tient ould expect in the present experi- 
stude a comment effect for the better 
one ts than for the poorer ones. In fact, 
«qs, Might not be surprised if, among the 
Students, the No Comment group 


Were zi 
g even superior to the two comment 
Toups, 


è 
I . 
expres a 5 X 5 table, a perfect correlation 

ssed as C would be only about 9 


Nemar [4], p. 205). 
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TABLE 9 


Mean or Mean RANKS FOR DIFFERENT 
LETTER GRADES 


= 

Letter Grade N F S 
A 1.93 2.04 2.03 
B 1.91 2.11 1.98 
Cc 1.90 2.06 2.04 
D 2.05 1.99 1.96 
F 1.57 2.55 1.88 


Note.—Each eligible class was assigned one mean rank 
for each cell of the table. 


The various letter grades achieved mean 
scores as shown in Table 9, and the analy- 
sis of variance resulted as shown in Table 
10. There was considerable interaction be- 
tween letter grade and treatment effect, 
but it was caused almost entirely by the 
remarkable effect which comments ap- 
peared to have on the “p” students. None 
of the other differences, including the par- 
tial reversal of the “D” students, exceeded 
chance expectation. 

These data do not, however, represent 
the total sample previously used, since 
the analysis could use only those student 
levels in which all three students received 
the same letter grade on Test One.’ There- 
fore many class-groups were not repre- 
sented at all in certain letter grades. For 
example, although over 10% of all letter 
grades were “F,” only 28 class-groups had 
even one level consisting entirely of “F” 
grades, and most of these classes had only 
one such level. Such circumstances might 
cause a somewhat unstable or biased esti- 
mate of effect. 

Within such limitations, the experiment 


7 When levels consisted of both “A” and 
«p” students, for example, “A” students 
would tend to receive the higher scores on 
the second test, regardless of treatment; 
thus those Free Comment “A” students 
drawn from mixed levels would tend to 
appear (falsely) more responsive than the 
Free Comment “B” students drawn from 
mixed levels, ete. Therefore the total sample 
was considerably reduced for the letter- 
grade analysis. 
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TABLE 10 
Tue RELATION BETWEEN LETTER GRADE AND TREATMENT Errecr 


Source Squares if gon F Probability 
Between Treatments: N, F, S 2.77 2 1.385 5.41 <.01 
Between Letter Grades 0.00 4 0.000 
Bet. Blocks Within L. Gr. (pooled) 0.00 65 0.000 
Interaction: T X Letter Grades 4.88 8 .610 2.40 -05>p>.01 


Residual (error term) 


Total 


32.99 130 .254 
40.64 209 


Note—Modified for mean-rank data from Lindquist (3, p. 269). Because sampling was irregular (see text) all 
eligible classes were randomly assigned to 14 groupings. This was done arbitrarily to prevent vacant cells. 


provided strong evidence against the 
teacher-myth about responsiveness and 
letter grades. The experimental teachers 
appeared plainly mistaken in their faith 
that their “A” students respond relatively 
brightly, and their “F” students only 
sluggishly or negatively to whatever en- 
couragement they administer. 


SUMMARY 


Seventy-four randomly selected second- 
ary teachers, using 2,139 unknowing stu- 
dents in their daily classes, performed the 
following experiment: They administered 
to all students whatever objective test 
would occur in the usual course of in- 
struction. After scoring and grading the 
test papers in their customary way, and 
matching the students by performance, 
they randomly assigned the papers to one 
of three treatment groups. The No Com- 
ment group received no marks beyond 
those for grading. The Free Comment 
group received whatever comments the 
teachers felt were appropriate for the 
particular students and tests concerned. 
The Specified Comment group received 
certain uniform comments designated be- 
forehand by the experimenter for all simi- 
lar letter grades, and thought to be gen- 
erally “encouraging.” Teachers returned 
tests to students without any unusual at- 
tention. Then teachers reported scores 
achieved on the next objective test given 


in the class, and these scores became the 
criterion of comment effect, with the fol- 
lowing results: 

1. Free Comment students achieved 
higher scores than Specified Comment stu- 
dents, and Specified Comments did better 
than No Comments. All differences were 
significant except that between Free Com- 
ments and Specified Comments. 

2. When samplings from 12 different 
schools were compared, no significant dif- 
ferences of comment effect appeared be- 
tween schools. 

3. When the class-groups from six dif- 
ferent school years (grades 7-12) were 
compared, no conclusive differences of 
comment effect appeared between the 
years, but if anything senior high was 
more responsive than junior high. It would 
appear logical to generalize the experi- 
mental results, concerning the effective- 
ness of comment, at least to the early 
college years. 

4, Although teachers believed that their 
better students were also much more re- 
sponsive to teacher comments than their 
poorer students, there was no experimental 
support for this belief. 

When the average secondary teacher 
takes the time and trouble to write com- 
ments (believed to be “encouraging”) on 
student papers, these apparently have a 
measurable and potent effect upon stu- 


dent effort, or attention, or attitude, or 


i 
Ñ 4 
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whatever it is which causes learning to 
improve, and this effect does not appear 
dependent on school building, school year, 
or student ability. Such a finding would 
seem very important for the studies of 
classroom learning and teaching method. 
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Olson (7) and Millard (6) report that 
both the average of and the relationships 
among seven ages in months—height, 
weight, grip, dental, carpal, mental, and 
reading—are useful in appraising the 
child’s level of performance in other areas 
such as language and arithmetic. Klaus- 
meier (5), however, found no statistically 
significant differences in height, weight, 
grip, dentition, and carpal age between 
high- and low-achievers in arithmetic and 
language. With this finding the present 
study was undertaken to ascertain how 
useful the first seven measures were (a) 
when combined in a best-combination re- 
gression equation and (b) when considered 
of equal weight as in Olson’s system of 
organismic age, for predicting arithmetic 
and language achievement 12 months after 
the original measures were secured. 


PROCEDURE 


The subjects of this investigation were 
21 boys and 24 girls who were third- 
graders in 1955-56 and fourth-graders in 
1956-57 and 29 boys and 24 girls who were 
fifth-graders in 1955-56 and sixth-graders 
in 1956-57. These children were enrolled 
in four regular classes of two large elemen- 
tary schools of Madison and were all the 
children enrolled in the classes at the times 
the two sets of measures were secured. 

The following measures were obtained: 
standing height, weight, strength of grip 
of the preferred hand, number of perma- 
nent teeth, bone development of the wrist 
and hand, mental age with the California 
Short-Form Test of Mental Maturity 
(S-Form), and achievement in reading, 
arithmetic, and language with the Cali- 
fornia Achievement Tests (Complete Bat- 
tery, Form AA, Primary, Elementary, and 


Intermediate). In all instances results of a 
single measure were used except for 
strength of grip and carpal age. In secur- 
ing strength of grip, a first measure was 
taken with the palm of the hand upward, 
grasping the dynamometer; the second 
with the palm downward; and the third 
with the palm upward. The average of the 
highest two of the three measures was used 
as strength of grip. This method was found 
to yield a higher test-retest correlation, 
0.92, than using the single highest score 
with the palm upward, 0.83. All X-rays 
were read independently by the same two 
resident radiologists and the average of 
the two readings in months was used as 
carpal age. 

The above measures were secured in 
October, 1955, and again in October, 1956, 
within the same week for each of the four 
classroom groups, and each measure at ap- 
proximately the same hour of the day. 
Over this twelve-month interval, the meas- 
ures were found to be relatively consistent 
as the correlation coefficients in Table 1 
show. The primary level of the mental 
maturity and the achievement tests was 
used in 1955, the elementary level in 1956. 
The elementary level of the achievement 
battery was used in the fifth grade, the 
intermediate level in the sixth. No change 
was made in level of the mental maturity 
test. 

Evidence of reliability and validity of 
the measures is now given since measure- 
ment is crucial in this study. In 1955, @ 
random sample of 30 children was draw? 
from the total population of this study 
and remeasured within 24 hours after the 
first measuring of height, weight, and 
strength of grip. The test-retest correla- 
lations were .99 for height, .99 for weight, 
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TABLE 1 
CORRELATIONS BETWEEN MEASURES 
OBTAINED ONE YEAR APART, IN 
OCTOBER 1955 AND OCTOBER 
1956 


Third-Fourth | Fifth-Sixth 
Graders Graders 


Boys | Girls | Boys | Girls 


Height (inches) .99 | .82 | .96 | .95 
Weight (pounds) .87 | .96 | .97 | -96 
Grip (kilograms) .64 | .83 | .76 | .62 


Permanent teeth : d 3; 
Carpal Age (month)| -85 | .73 | -83 | -72 
Mental Maturity -50 | .43 | -76 | -86 
Score 
Reading Test Score | .55 | .85 | -86 | -63 
Arithmetic Score : 
Language Test Score] .87 | .66 


and .92 for grip. The two radiologists’ 
independent readings of all the X-rays 
showed a correlation of .95 for third- 
graders and .86 for fifth-graders. Checks 
of successive dental records by the re- 
searchers showed that the dentist had 
identified permanent teeth without error. 
Thus, reliability of the five physical meas- 
ures is considered high; and the test 
manuals report high reliability for the in- 
telligence and achievement measures used. 
In addition, the correlations on the 
achievement measures reported above over 
the 12-month period indicate quite high 
reliability. 


Concurrent validity of certain measures 
was also determined with the third-grade 
children. For a random sample of 30, 
California M.A. and Stanford-Binet M.A., 
obtained within six weeks of each other, 
correlated .82. Scores from the California 
Achievement Test in Reading correlated 
90 with the Gates Advanced Primary 
Reading Test. Also, each of the four 
teachers ranked their children from highest 
to lowest in reading, arithmetic, and lan- 
guage achievement. The resulting rank- 
order correlations between test scores and 
teacher ratings are: reading—.91, .91, 90 
and .88; arithmetic—.65, .82, .77 and .56; 
language—.69, .83, 85 and .77. In each 
set of four correlations, the first two are 
for third-grade and the last two for fifth- 
grade classes. Considering the difficulty of 
arranging a group of 30 children in rank 
order in each of the subject-matter areas, 
the researchers consider the correlations 
as indicating that the achievement tests 
measure the teachers’ objectives suffi- 
ciently well for the purpose of this study. 


FINDINGS 


In Table 2 are presented the mean 
Pearson product-moment correlations 
among the measures as obtained in Oc- 
tober, 1955. The mean correlations for the 
four groups of boys and girls are presented 
in Table 2 rather than each correlation on 
which the regression equations are based in 


TABLE 2 


MEAN CORRELATIONS AMONG 
Boys AND GIRLS, 


Raw Scores IN Nine MEASURES oF 
Turd AND FIFTH GRADES 


Height | Weight Grip | Dental Carpal | Mental | Rdg. Arith. | Lang. 

7 46 .00 42 .05 .00 .01 -00 
Height -65 E ‘o2 | .35 |—.01| —.04 | —.01 | —.02 
Ge ght 13 39 12 15 -18 .02 
Bon, 18 | —.07 05 10 04 
Geata -05 04 -09 -00 
arpal .62 -64 59 
Mental qi | 76 
Reading ‘74 


Arithmetic 
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TABLE 3 


CONTRIBUTIONS oF SEVEN MEASURES TO THE CORRECTED MULTIPLE Rs AND BETA 
WEIGHTS FOR THE REGRESSION EQUATION TO PREDICT ARITHMETIC ACHIEVEMENT 


Height Weight Grip Dental Carpal | Mental | Reading 
8rd boys 
Beta weight -336 .819 
R -890 (2) .830 (1) 
3rd girls 
Beta weight -219 .419 .534 
R -766 (3) -734(2) | .712(1) 
5th boys 
Beta weight — .302 371 — .258 -300 423 
R 748 (5) | .722(4) | .719(8) -695(2) | .673(1) 
5th girls 
Beta weight -210 159 423 AT4 
R -890(3) | .906(4) -886(2) | .855(1) 


Note.—Blanks signify that the measure did not contribute .01 to the corrected Multiple R and did not then enter 


the regression equation. 


order to present a more concise summary. 
For a correlation to be statistically sig- 
nificant from 0 at the .05 level (2), it 
must be between 367 and 404, depending 
upon the size of the N previously given. 
Table 2 shows that no mean correlation 
between the five physical measures and the 
three achievement measures is significant 
at the .05 level and dentition does not cor- 
relate significantly with any physical 
measure. Of the original 80 correlations 
between physical and achievement meas- 
ures, only two, weight and language—fifth 
grade, were significant at the .05 level. 
However the other two measures com- 
prising the basis of organismic age, mental 
and reading, correlate positively and sig- 
nificantly with arithmetic and language 
achievement. 

Multiple correlations were computed, 
using the original correlations by grade 
and sex, and regression equations were de- 
rived to predict language and arithmetic 
achievement 12 months later. The multiple 
R and regression equation were calculated 
by the Wherry-Doolittle Test Selection 
Method (3). Any of the seven measures 
contributing .01 or more to the multiple 
R, uncorrected for shrinkage, were in- 


cluded in the multiple regression equation, 
provided their inclusion did not actually 
lower the multiple R when corrected for 
shrinkage. The Beta weights in the multi- 
ple regression equations for predicting 
scores were secured with the IBM 650 
computer by a method of inverse correla- 
tion matrices. Table 3 shows the corrected 
multiple Rs obtained between the seven 
organismie measures and arithmetic 
achievement, the order in which the vari- 
ous measures went into the multiple cor- 
relations, and the Beta weights for each 
regression equation. Reading correlated 
higher than any other measure with arith- 
metic for the four groups and thus went 
first into the multiple R and the regression 
equation. Differences between groups in 
the order in which the seven measures went 
into the regression equations and differ- 
ences in Beta weights are not so impor- 
tant, however, as the finding that the best 
combination of all five physical measures 
increased the corrected multiple R by only 
.060 for the third-grade boys, by .032 for 
third-grade girls, by .053 for fifth-grade 
boys, and by .020 for fifth-grade girls. 
Table 4 shows similarly that the physical 
measures increased the corrected multiple 
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s TABLE 4 
CONTRIBUTIONS OF SEVEN MEASURES TO THE CORRECTED MULTIPLE Rs anp BETA 
WEIGHTS FOR THE REGRESSION EQUATION TO PREDICT LANGUAGE ACHIEVEMENT 
Height | Weight Grip Dental Carpal | Mental | Reading 
3rd boys 
Beta weight 
R .854 (1) 
3rd girls 
Beta weight -317 .264 —.341 -317 „527 
R -766 (3) .788 (5) .759(2) | .767(4) | .746(1) 
5th boys 
Beta weight 406 — 460 211 297 -360 
R .826(4) | .764(3) .845(5) | .662(1) | .718(2) 
5th girls 
Beta weight .209  |—.422 -664 
R .763(3) | -762(2) «729 (1) 
ple R and did not then enter 


Note.—Blanks signify that the measure did not contribute .01 to the corrected Multi, 


the regression equation. 


R for language above that obtained with 
reading or a combination of reading and 
mental age by .00 for third-grade boys, 
042 for third-grade girls, .127 for fifth- 
grade boys, and .034 for fifth-grade girls. 

In an attempt to ascertain whether or- 
ganismic age is a better predictor of 
achievement in arithmetic and language 
than is the predicted score derived by 
Means of regression equations, Pearson 
product-moment correlations were cal- 
culated between regression-equation pre- 
dicted scores and the raw scores in arith- 
metic and language obtained one year 
later, and also between organismic age mM 
months and the scores obtained one year 
later. These results are presented in 
Table 5. 


Table 5 indicates that the correlations 
are higher between the 1956 obtained and 
1956 predicted scores derived from re- 
gression equations than between the 1956 
obtained scores and the 1955 organismic 
age. Four of the eight correlations be- 
tween organismic age and predicted 
achievement in arithmetic and language 
are significant at the .05 level; but all the 
correlations between regression-equation 
predicted scores and obtained scores are 
significant beyond the .01 level. It is antic- 
ipated, of course, that were the same re- 
gression equations applied to other sam- 
ples, the obtained correlations between 
predicted and actual scores would be 


lower. 
The present results are in accord with 


TABLE 5 


CORRELATIONS BETWEEN ORGANISM 


1C AGE PREDICTION, REGRESSION-EQUATION 
ARITHMETIC AND LANGUAGE SCORES 


PREDICTION AND OBTAINED 


Arithmetic 
G N Score and. 
sp, Agr Prediction 
3rd boys 21 .370 
8rd girls 24 .039 
sth ee 2 380 


5th girls 24 


Arithmetic Language 
Language Score Score Score. 

d 0; ismic ii ji 
“Age Prediction  'Brodietioa "Prediction 
.273 683 «768 
.491 -731 -665 
.472 -629 -657 
.581 -743 -693 
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those of Gates (4) and Blommers, Knief 
and Stroud (1), who used designs different 
from the present study. 


SUMMARY 


This study was conducted to compare 
the efficiency of organismic age, the aver- 
age of seven ages, and regression equations, 
based on raw scores in the same seven 
measures, in predicting arithmetic and 
language achievements of third- and fifth- 
grade children 12 months after the origi- 
nal measures were secured. The regression- 
equation predictions, correlated higher 
with actual achievements than did organ- 
ismic-age predictions. The five physical 
measures in organismic age contributed 
little to mental and reading scores in pre- 
dicting arithmetic and language scores. 


_H. J. KLAUSMEIER, A. BEEMAN, AND I. J. LEHMANN 
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SEX DIFFERENCES IN THE RETENTION OF 
QUANTITATIVE INFORMATION 


ROBERT SOMMER? 
The Saskatchewan Hospital, Weyburn 


Sex differences in arithmetic reasoning 
and spatial relations have consistently 
been found. Men are superior to women 
on these tests, while women excell on tasks 
requiring verbal ability and memory (1, 
5). However, several authors (1, 5) state 
that the female superiority in recall does 
not hold if the material is more interesting 
to the males. This is a reasonable state- 
ment but there has been very little ‘re- 
search designed to investigate the reasons 
for this. It is also interesting to note that 
there has been far more research re- 
lating to motivational factors in percep- 
tion than there has been research relating 
to motivational factors in memory. Re- 
Search on this latter point should be of 
concern to educators and others who hope 
to teach subject matter to students who 
vary widely in their interest in the content 
of courses. 

This study had its origins in the ob- 
Servation that during routine testing of 
hospital patients with the Wechsler- 
Bellevue Information Scale, men did con- 
Sistently better than women on items in- 
volving estimations of size or distance. 
Even intelligent women usually would be 
unaware of the population of the United 
States or the distance from New York to 
Paris, There seemed an inability to retain 
this information. It should be stressed that 
this was not a matter of computation or of 
analytic reasoning, rather it was a ques- 
tion of the retention of information that 
they had been exposed to a number of 
times, 


* This study was started by the writer aud 
John Hinkle, Prabha Khanna, and W alter 

cDonald, and carried to conclusion by 
the present writer. We would like to thank 
J. B. Ray, G. MeMurray, and H. Cooper- 
Stock for use of their classes for testing. 


This sex difference is not unknown to 
writers and educators. Weber (6) writes, 
“Mention mathematics to a women and she 
freezes into a condescending attitude of 
tolerance—she knows it exists, she uses 
it when she must, but it certainly has 
very little to do with her own delightfully 
imaginative and delicate world of inter- 
ests.” However our experience has been 
that this debility is more fundamental than 
simply a reflection of hostility to mathe- 
matical reasoning. Schilder (3) speaks of 
our remembering “only what we can and 
will use in the present situation.” If this 
is so, then investigation into the retention 
of information should lead us into some 
rather basic attitudes regarding what type 
of information women believe is useful, 

The purpose of the present studies is to 
determine whether these sex differences in 
recall of sizes and distances, observed 
clinically with hospital patients, would ap- 
pear in other samples. Also of interest is 
whether there will be sex differences in the 
immediate recall of new quantitative ma- 
terial. 


EXPERIMENT ONE 


Procedure 


Two populations of Ss were sampled. 
The first consisted of patients in a mental 
hospital whose psychological test records 
were available in the files. The second 
was composed of students in elementary 
psychology classes at a small Midwestern 
college. The patient sample included only 
those whose last initials began with B, N, 
P, and T, who had been tested with the 
Information Scale of the Weschler-Belle- 
vue, who were between 18 and 69 years 
old, and who had IQ’s of above 70. This 
provided 156 cases, 96 males and 60 fe- 
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TABLE 1 
PERCENTAGE OF CORRECT RESPONSES TO QUANTITATIVE ITEMS 
Male ‘Female Male Female 
Item patients patients prt students students p 
N = 96 N = 60 N= 61 N = 34 
Population (U.S.) 41 18 (.01) 64 47 (.08) 
Distance 36 10 (.001) 54 59 N.S.) 
Pints a i 72 91 (.02) 
Teaspoons bd 55 13 70 (.001) 
Population (college town) id z 51 28 (.05) 


* Item either not administered or scored for this group 
** All p values are based on chi-square tests. 


males. The students were all between 18 
and 28 years old and constituted a sample 
of 61 males and 34 females. 

The procedure for the patients was 
simply to tally and compare the number 
of correct “population of U.S.” and “dis- 
tance from New York to Paris” responses 
from males and females. (It can be 
noted that there was not a significant 
difference between the mean IQ of the 
males, 94.3, and that of the females, 95.6). 
The students were all tested in their class- 
rooms by an examiner who requested them 
to answer the following questions: 


1. How far is it from New York to 
Paris? 

2. What is the population of the United 
States? 

3. How many pints are there in a quart? 


TABLE 2 


PERCENTAGE OF Correct RESPONSES 
TO QUANTITATIVE ITEMS 


Male Female 
Item students students ? 
N=89 N=65 


Population 85 55 01 
(Canada) 
Distance 54 25 01 
Pints 83 86 N.S. 
Teaspoons 20 48 -01 
Population 89 52 .01 
(university 
town) 


4. How many teaspoons are there in a 
tablespoon? 

5. What is the population of (the town 
in which the college is located) ? 


Results 


The responses to the information items 
are presented in Table 1. It is evident that 
there are very large differences in regards 
to estimating the population of the U.S. 
(with males excelling) and in recalling the 
number of teaspoons in a tablespoon (with 
females excelling) while many of the other 
differences are moderately large. 


EXPERIMENT Two 


In a study of this sort where a classifica- 
tion of Ss by sex is used, it is hazardous 
to speak of a genuine sex difference until 
a number of different groups have been 
sampled. Although the preceding table 
compared both college students and pa- 
tients, it seemed in order to replicate the 
study with a fresh sample. 

This time a group of 154 Canadian uni- 
versity students was used. The procedure 
followed was the same as in the previous 
study except that the questions were 
altered to suit the Canadian culture; that 
is, the Ss were asked the population of 
Canada, the population of their university 
town, etc. The results are presented in 
Table 2. It is clear that the sex differences 
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in the previous sample are supported by 
these results. The males do far better on 
the population and distance items, while 
the females excel in recalling the number 
of teaspoons in a tablespoon. There is no 
difference between the sexes in answering 
the pints in a quart item. ` 


EXPERIMENT THREE 


With the procedures used in Experi- 
ments One and Two, we were not able to 
control the exposure of our Ss to the in- 
formation that was requested. Hence it 
was thought desirable to present new 
material to a group of Ss and see if the 
males would surpass the females in re- 
membering quantitative material. 


Procedure 


Two brief paragraphs were constructed, 
each containing both quantitative and non- 
quantitative material. Care was taken to 
See that the quantitative material should 
be “new” to the Ss so that any differences 
could not be attributed to previous ex- 
posure. Hence the “facts” that were pre- 
Sented were fabricated and for the most 
Part incorrect. The paragraphs were as 
follows: 


The Swedish ship, the Queen Fredrika, 
delivered its cargo of 12,000 pounds of wheat 
to Bombay. This city of 1,500,000 in a 
Country of 264 million people is one of the 
richest trading ports in the Far Bast. The 
Captain of the ship was Olaf Hansen. 
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Last week was the scene of a bloody 
revolution in Venezula. This country of 
116,000 square miles is one of the richest 
oil-preducing centers in the world. More 
than 1,200,000 barrels are shipped every 
month. The other important exports are 
tin, bananas, and cocoa. 


The Ss in the present study were 49 
women and 27 men who were studying to 
be psychiatric nurses at a large mental 
hospital. All Ss had at least an 11th grade 
education. The Director of Nurses Train- 
ing reported that he did not feel that there 
was any difference between the men and 
the women in intelligence, except that the 
women “did better on the exams” than 
the men. However a control for IQ can 
be found in the number of items of non- 
quantitative material retained by the men 
and the women. 

The Ss were tested at their customary 
class sessions and were told that a para- 
graph would be read to them. When the 
examiner directed them to begin, they 
should write down all they remembered of 
it. The instruction to begin writing fol- 
lowed several seconds after the reading of 
each paragraph. 

In scoring the recall data, the para- 
graphs were divided into “sense units” 
(similar in form to the units of the 
Wechsler Memory Scale). For example, 
/the Swedish ship/ the Queen Fredrika/ 
delivered its cargo/ of 12,000 pounds/... 
ete., this yielded a total of 25 nonquanti- 


TABLE 3 
AVERAGE NUMBER OF ITEMS REMEMBERED (NURSES) 


Nonquantitative items 


Quantitative items 


youll > Avg. No. p 
g. No. SD t $ Remem- SD t (one- 
Renoma bered tail) 
Males 1:30 -2% 
N =27 13.44 .72 a NS. 127 | 406 
Females -79 12 
N = 49 13.24 58 
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tative units and 5 quantitative units. A 
quantitative unit was scored as correct if 
the number was recalled correctly regard- 
less of whether the unit (pounds, bushels, 
etc.) was accurate. All information as to 
the sex of the respondents was removed 
and the scoring was done by the writer. 
Twenty of the protocols were also scored 
independently by another researcher. The 
coefficient of agreement between scorers 
was .93 for the nonquantitative scores and 
1.00 for the quantitative scores. 


Results 


The average number of quantitative 
items recalled by the men was 1.80 + .26 
while the average number recalled by the 
women was .79 + .12. This difference is 
significant by t test at beyond the .05 
level. On the nonquantitative items, no 
difference in recall was expected. As is 
shown in Table 3, this prediction was also 
confirmed. 

As the level of significance of the dif- 
ference in Table 3 is not high, it was de- 
cided to repeat the procedure using a 
fresh sample. The paragraphs were read 
to students in two elementary sociology 
classes at the University of Saskatchewan. 
The procedure was identical to that used 
with the nurses. The results are presented 
in Table 4 and show that the female stu- 
dents do slightly better than the males 
in recalling the nonquantitative informa- 
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tion, but the males do significantly better 
than the females in recalling the quanti- 
tative information. When the two samples 
are pooled, a chi-square test shows that 
the sex difference in recalling the quanti- 
tative items is significant beyond the .01 
level (x* = 5.86, p < .01). 


Discussion 


The results from Experiments One and 
Two confirm the prediction that female 
Ss would be poorer than male Ss on the 
two Wechsler Information items (popu- 
lation and distance) under consideration. 
It should be remembered that although 
these were designated as “quantitative 
items,” they did not involve computa- 
tion, judgement, or even analytic reason- 
ing. To answer a question dealing with 
the population of the U.S. is not ordi- 
narily a test in estimating size or number. 
No one has “seen” the population of the 
US. and few Ss will attempt to find the 
answer by dividing the world’s population 
by a set percentage. This item involves 
simply the retention of a word or num- 
ber that one has seen and heard many 
times. 

One can attempt to explain these re- 
sults on the basis of the greater familiarity 
of the male Ss with population and dis- 
tance judgments and the females with 
pints and teaspoons. Yet this does not 
provide the whole answer for it is appar- 


TABLE 4 
AVERAGE NUMBER or ITEMS REMEMBERED (STUDENTS) 


Nonquantitative items 


Quantitative items 


Avg. No. s avg No. 
Remem- D t p emem- SD t 
bered bered (oneal) 
Males 
N = 36 15.06 41 1.42 -69 
.81 N.S. 1.73 
Females 
N=74 15.50 -36 1.00 33 


z| 
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ent that both men and women have been 
exposed to all of these items a number of 
times, surely enough for learning. We are 
all familiar with people who know the 
number of feet in a mile and the formula 
E = mC but are unable to remember the 
number of pints in a quart. Such for- 
getting is highly selective as was shown in 
the Levine and Murphy experiment (2). 
Experiment Three shows this difference 
appears with new material and cannot 
be attributed solely to a difference in pre- 
vious contact with the information. This 
should have implications for the teaching 
of mathematics and other subjects to fe- 
male students, Apparently the poorer per- 
formance of female students on tests of 
mathematics is more fundamental than a 
distaste for computation or algebra. Fur- 
ther research is necessary to determine the 
extent of this debility. There are some clues 
M our research that this deficiency is not 
directly related to an antipathy to all num- 
bers. The females excelled in recalling the 
number of pints in a quart and teaspoons 
in a tablespoon. We also administered a 
brief test of digit span to the university 
Students used in Experiment Three. Al- 
though the males had surpassed the females 
m recalling the quantitative information 
from the paragraphs, there was no dif- 
ference in the recall of six and seven digits. 
This result parallels the negligible sex 
differences in recall of digits mentioned by 
erman (4). Perhaps this indicates that 
many women are unable to retain large 
numbers (thousands or millions). An anal- 
ysis was made of the type of errors in re- 
calling the numbers from the paragraphs. 
t was found that 36% of these errors were 
due to the incorrect placement of the sig- 
nificant figures, That is, the S wrote 
“1200” or “190,000” instead of “12,000”; 
or “16,000,” “1,600,000” or 1,016,00” in- 
Stead of “116,000.” This type of error con- 
Stituted 40% of the incorrect responses by 
females and 27% of the incorrect responses 
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by males. This difference is not significant 
and it should be realized that these are 
percentages of the incorrect responses 
given by all Ss. That is, it does not include 
Ss who did not write any figure or who 
wrote the correct figure. However it does 
show that there is need for research on 
the types of numbers that can be handled 
by men and women. If women are able to 
remember seven digits but can not re- 
member five or six digit numbers it is 
important to learn the psychological char- 
acteristics of numbers qua numbers, in- 
stead of numbers as unrelated series of 


digits. 
SUMMARY 


This study was undertaken to determine 
whether some differences between male 
and female patients seen in a hospital set- 
ting in the retension of quantitative in- 
formation would be found in further tests. 
Three groups of Ss were used: 156 hos- 
pital patients, 95 U.S. college students, and 
154 Canadian college students. They were 
given several Wechsler-Bellevue Informa- 
tion items (population of U.S., pints in a 
quart, distance from New York to Paris) 
and a few other items. The results dis- 
closed that the males did better on the 
population and distance items while the 
females performed better on the pints and 
teaspoons item. It was also shown that 
males were better able to retain new quan- 
titative information when tested for im- 
mediate recall. No sex differences were 
found in remembering nonquantitative 
material. A brief digit span test also dis- 
closed no sex differences. The implications 
of this for research into selective retention 


were discussed. 
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The children’s form of the manifest 
anxiety scale (CMAS) developed by Cas- 
taneda, McCandless, and Palermo (1) 
offers a new and much needed group ad- 
ministered criterion of child behavior at 
the lower school-age levels, specifically for 
the fourth, fifth, and sixth grades. Criteria 
of these kinds are understandably more 
Scarce at the first three school grades, a 
fact which prompted the present study. 

Test-retest reliability coefficients re- 
ported by Castaneda, et al. (1)? ranged be- 
tween .70 and .94 for samples of fourth, 
fifth, and sixth graders on whom the CMAS 
was standardized. In a series of articles 
using the CMAS as a predictor, the same 
investigators reported significant relation- 
ships between anxiety levels and perform- 
ance on various learning tasks (2, 7), 
Preponderantly negative relationships be- 
tween anxiety and popularity (6), and 
evidence to suggest that anxiety is mean- 
Ingfully related to school achievement and 
Intelligence for certain grade levels (5). 

The purpose of the present research was 
to repeat essentially the reliability and 


* The study was sponsored jointly by the 
Agricultural Experiment Station and the 

ollege of Home Economics, Department of 
Child Development and Family Relation- 
ships of the University of Tennessee. 
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eCandless, Alfred Castaneda, Ruth High- 
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2? Not to be confused with Ref. (2)—here- 
after, Castaneda, et al. refers to (1). 


standardization study of Castaneda, et al. 
using a third grade rural level in contrast 
to their samples of fourth, fifth, and sixth 
graders enrolled in a city school system. 


METHOD 


Subjects 


A total of 121 children,’ 64 boys and 57 
girls, enrolled in four third grade class- 
rooms of two rural schools located in an 
East Tennessee county served as Ss. Three 
classrooms were in one school, and the 
fourth in another school. The schools, sep- 
arated by about five miles or less, served 
adjacent communities of less than 2500 
population. Parents of the Ss were from a 
generally low- to middle-socioeconomie 
stratum as judged by occupational data. 
The Ss were about equally distributed as to 
number and sex within classrooms. 


CMAS Description 


The CMAS consists of 53 items.* Forty- 
two items were designated by Castaneda, 
et al. as “anxiety” items and formed the A 
scale (abbreviation imposed by present 
author); 11 items were “... designed to 
provide an index of the subject’s tendency 
to falsify his responses to the anxiety items 
...” (1, p. 318) and were labelled by the 
test authors as the L scale. By definition, 
the higher the A scale score, the higher the 
anxiety; and the higher the L scale score, 
the greater the tendency to falsify re- 
sponses to the A scale. 


3 One female S not included in analyses 
due to absence during second test adminis- 
tration. 

4 The items and scoring procedures may 
be found in (1, pp. 318-319). 
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Procedure 


Two major steps were taken to maximize 
reading ease and comprehension: (aì In- 
structions were altered slightly so the 
teacher could give the items orally as the 
S followed on his own copy—in the Casta- 
neda, et al. study, each S read and marked 
the items by himself. (b) Items were triple- 
spaced and typewritten in capitals. The in- 
structions used in the present study are re- 
produced as follows: 


TO BOYS AND GIRLS 


Follow each question carefully as I read 
it aloud to you. When I finish reading each 
question to you, put a circle around the 
word YES if you think it is true about you. 
Put a circle around the word NO if you think 
it is not true about you. Now let us begin. 


The testing program was carried out dur- 
ing the second half of the 1956-57 school 
year. The retest interval was approximately 
one week—seven days for three groups and 
six days for the fourth. 


RESULTS AND Discussion 


Scores obtained on the four groups were 
combined to form a single sample for the 
analyses. Table 1 includes the respective 
means (Ms) and standard deviations (SDs) 
for the first and second A scales (Ay and 
A2), and similarly, for the first and second 
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L scales (Li and Ls). Additionally, Table 1 
contains the test-retest coefficients of re- 
liability (Pearson r) for both scales. 

As the various results are described, they 
will be compared with the Castaneda, et al. 
study. An attempt has been made to re- 
strict comparisons mainly to trends, because 
the two studies concerned different grade 
levels, slightly different administrative in- 
structions, and necessarily different error 
term components in tests of significance. 
Also, Castaneda, et al. presented only in- 
itial test data. 


A and L Scale Ms and SDs 


From Table 1 it is important to indicate 
that: (a) For both A and L, the Ms for 
girls (see Col. 8) were higher than the cor- 
responding Ms for boys (see Col. 1) on 
both initial and final tests; and (b) for each 
sex, the Ms of As and Lz (see Rows 2 and 
6) were less than the corresponding Ms of 
A; and Ly (see Rows 1 and 5). Analyses of 
variance (Sex X Test Order) for the A and 
L scores separately resulted in one signifi- 
cant (coefficient of risk, p = .01) main 
effect—the pooled M of L, (4.70) was 
significantly higher than the pooled M of 
L» (4.02). Both interactions and other main 
effects were nonsignificant. 

Using Sex X Grade analyses of variance 
with A; and L, scores as separate criteria, 


TABLE 1 
A ano L Scare Test-Revest Means, SDs, AND RELIABILITIES (r) 
FOR POOLED AND SEPARATE SEXES 


Boys 


rein Girls Pooled—Sex 
M SD M SD M SD 
A 18.86 7.91 19.37 7.67 19.10 7.80 
Ae 17.95 8.75 18.75 9.22 18.33 8.99 
Pooled—Ar-As 18.41 8.35 19.06 8.48 
Reliability r= 82 r= 71 r= 83 
L 4.04 2.39 5.11 1.90 4.70 2.20 
A ; 2.00 4.44 2.29 7 
Pooled—Ly-Ls 4.06 2.22 4.77 2.13 ane ý 
Reliability r= 65 r= 78 r = .70 
Note 


.—Ns: Boys = 64; Girls = 57. All rs significantly different from zero—coefficient of risk. 


„p= 01. 
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Castaneda, et al. found that girls scored 
significantly higher than boys on both 
scales. Thus, the significant sex differences 
of the Castaneda, et al. study were con- 
firmed in direction by results of the present 
study. 

In terms of magnitude, the Castaneda, 
et al. Ms and SDs of Ai, and the SDs of 
Li, were very close to the corresponding 
values of the present study. However, the 
Ms of Ly of the present study were approxi- 
mately twice the size of those reported by 
Castaneda, et al., a finding that suggests an 
Interesting hypothesis for further study. 

Reeall from above that on the initial 
Scales there were higher anxiety levels and 
More falsification than on the final scales 
(compare pooled Ms in right-hand section 
of Table 1). Clinically speaking, it would 
Seem logical to expect such a result, since 
there was perhaps some anxiety associated 
With taking the test itself the first time 
which tended to dissipate during the week 
Prior to taking the test a second time. The 
` scale may be viewed then as having 

played the role” of a defense against 
anxiety, so that more falsification occurred 
in the initial test, when Ss were presumably 
More anxious, than during the second test 
when tess anxiety about test-taking was 
Operating. 


A and L Scale Frequency Distributions 


Frequency polygons, smoothed by the 
method of running averages (3, pp. 52-54), 
Were plotted for each sex separately and 
for both sexes combined using the A; and 

1 Scale data. In general, the resultant six 
Curves were unimodal, approximately bell- 
Shaped, and fairly symmetrical. Most 
Curves tended to possess a very slight 
Positive skew but generally less skew than 
Curve data presented by Castaneda, et al. 

or the pooled A, sample of the present 
Study: median = 18.44; twentieth per- 
centile (Pa) = 12.80; and Ps. = 26.20. For 
the pooled Ly sample the same statistics 
Were respectively: 4.71; 2.73; and 6.73. 


Curves were not plotted for the A» and Lg 
data, but inspection of the frequency dis- 
tributions revealed them to be very similar 
to the A; and L, data. Considering the 
frequency data in general, the findings of 
the present study and those of Castaneda, 
et al. were in strong agreement. 


A and L Scale Test-Retest Reliabilities 


The rs between A; and As (ra,-a;) and 
between Lı and Le (r1,+1,), for pooled and 
separate sexes, are shown in Table 1. 
Three noteworthy features of the reliability 
results were: (a) All rs differed significantly 
from zero, represented substantial to 
marked relationships, and were generally 
comparable in size (although slightly lower 
in the case of A) to those of Castaneda, et 
al. (b) Coefficient 74,-,, was higher for 
boys (.82) than for girls (.71), a trend con- 
sistent with the fourth grade sample of 
Castaneda, et al. but opposite to their fifth 
and sixth grade samples. These combined 
facts suggested the hypothesis that within 
the age ranges included by both studies, 
boys tend to become less consistent in their 
responses to the A scale than do girls, but 
at the same time, the responses of both 
sexes maintain a relatively high level of con- 
sistency. (c) Due to the lack of significant 
sex differences for both A and L in the 
analyses of variance, although retention of 
the hypothesis of no sex differences does 
not prove it, it seemed reasonable to regard 
the pooled sex rsas the best single reliability 
estimates, viz., atas = .83 and fu'r, = 
.70. The corresponding single estimates re- 
ported by Castaneda, et al. were .90 and 
.70 respectively, thus the two studies agreed 
very closely in this respect. 


Correlation between the A and L Scales 


The assumptions are made that: (a) the 
L scale indicates the tendency for S to 
falsify answers to the A seale; and (b) an 
attempt to falsify could result in either a 
high or low A scale score. Ideally then, 
from the standpoint of measurement, 7s be- 
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tween the two scales would be zero or 
thereabouts. Correlations were computed 
between A; and L, and between Az and Le 
for each sex separately and for both sexes 
combined. Respectively, the ra,-1, and 
Ta,*1, coefficients for the pooled sample 
were .14 and .05; for boys, .15 and .09; and 
for girls, .11 and .01. None of the rs was 
significantly different from zero (coefficient 
of risk, p = .01). These rs were generally 
comparable to those of Castaneda, et al.— 
theirs were also nonsignificant ranging be- 
tween —.11 and .22. 


General Conclusion 


‘Theevidence obtained strongly supported 
the findings of the test constructors, Casta- 
neda, et al., who standardized their items on 
fourth, fifth, and sixth grade children. The 
principal conclusion drawn from the present 
study was that the A and L scales can be 
reliably employed as criteria using third 
grade rural children taken from populations 
similar to the one included herein. Whether 
or not the items are related to other opera- 
tionally defined concepts (validity) is a 
matter for empirical determination. 


Summary 


The main purpose was to obtain test- 
retest coefficients of reliability of the 
Children’s Form of the Manifest Anxiety 
Scale (CMAS) on a sample of 121 third 
grade rural Ss. The CMAS consisted of 42 
anxiety items (A scale) and 11 falsification 
items (L scale). The scales were adminis- 
tered to four classrooms by the respective 
teachers. The principal results, which were 
compared to those found in the reliability 
study of Castaneda, McCandless, and Pal- 
ermo, are listed as follows: 
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1. Pooled estimates of the reliabilitites 
(r) of the A and L scales were .83 and .70 
respectively. Correlations between the A 
and L scales approached zero. 

2. Girls scored higher than boys on both 
scales but not significantly. 

3. The general findings gave substantial 
support to those of Castaneda, et al. The 
evidence indicated that the A and L scales 
were sufficiently reliable to be used as cri- 
terion measures for samples from popula- 
tions similar to those employed in the 
study. 
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EFFECT OF PERIODICAL SELF-EVALUATION 
ON STUDENT ACHIEVEMENT 


HENRY J. DUEL 
Directorate of Civilian Personnel, USAF Headquarters 


Interest in the application of techniques 
of self-evaluation to education and train- 
Ing is a relatively recent development. 
While there has been an increasing number 
of articles which indicate favorable ex- 
Perience with self-evaluation, research evi- 
dence concerning its value is lacking. 
Russel] (4), in a 1953 survey of research 
on self-evaluation reports a lack of scien- 
tific study of the values of self-evaluation. 
Symonds (5) also indicates there are few 
Teports on research results. Rogers (3), 
however, reports favorable experiential 
Tesults with self-evaluation as a mode of 
appraisal, Thus self-evaluation has some 
empirical support but experimental evi- 
dence of its value is meagre. 


aids, conditions were considered especially 
favorable for a controlled study. 

In conducting the study, self-evalua- 
tion instruments developed by the ex- 
perimenter in a previous study were used 
(2). In that study he concluded that in 
schools of this nature, students could re- 
liably and validly evaluate gain in skills 
and knowledges achieved in a technical 
course of instruction. 

The instrument (which for convenience 
was called the SET) was one which re- 
quired the student to make an estimate of 
the level of skill or knowledge he possessed 
upon entrance into the course as well as 
the skill or knowledge he attained upon 
completion of the course. A sample item 
from the form is shown below: 


How proficient are you in using a multimeter to measure 
output voltages and currents of a vacuum tube? 


1 2 3 4 


5 6 T 8 9 


Am familiar with 
the job but need 
considerably more 
training and prac- 
tice 


Not pre 
pared 

to do the job 

without thor- 


ough training 
ance 


The study reported here was undertaken 
to determine if self-evaluation is of value 
n improving student achievement. In 
other words, does self-evaluation give the 
Student a basis for improved functioning 
as a student? 

The study was conducted in two Air 

Orce Schools at Scott Air Force Base, 

mois. Ss were Air Force enlisted 
Students in electronic communications 
Courses. Due to similarity of the students’ 
background, age, living conditions, apti- 
tudes and to close control of curriculum, 
teaching methods, training materials and 


Can do the job 
with further OJT 
and close super- 
vision and assist- 


Can do the 
job with no 
supervision 


Can do the 

job if given 

adequate su- 
pervision 


The student responded to each item by 
marking the scale with a check mark 
(v) to indicate the level of skill he 
thought he possessed at the beginning of 
the course. An X represented his estimated 
attainment at the end of the course. 

Using the SET as a device for self- 
evaluation, an experiment was organized 
in each of two schools. In School A, ap- 
proximately 100 cases were used as con- 
trol. The test group, also 100 cases, used 
the above-described device to evaluate 
themselves at the end of each “test 
point” of instruction. One to three weeks 
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elapsed between each test point. Thus 
each student in the test group evaluated 
his progress in the course every one, two 
or three weeks of the course. Each SET 
encompassed skills learned in the previous 
two or three weeks. 

School A consisted of 20 weeks of in- 
struction for six hours per day, five days 
a week. In this case the test was closely 
controlled by the experimenter so that all 
instructors administering the SET gave 
the same instructions and administered it 
in the same manner. 

In School B approximately 75 cases 
were entered as a test group and a like 
number was used as control. In this school, 
the SET was administered by school per- 
sonnel and was handled as a normal part 
of class activity. This was done for the 
purpose of determining the effect of using 
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self-evaluation in a “normal” class situa- 
tion and to determine if special conditions 
of the experiment may have had effect on 
achievement results. 

Student achievement was evaluated by 
regular course tests, used as a basis for 
determining grades. These criterion meas- 
ures were carefully constructed and vali- 
dated. Split-half reliabilities ranged from 
74 to .90. The criterion measures had 
curriculum validity through construction, 
since “test blueprints” were used to de- 
rive items directly from curricula mate- 
tials. In addition each item was evaluated 
by at least three experts as to its rele- 
vance to the job for which the man was 
being trained. Other item data including 
discrimination and difficulty indexes were 
used in construction of the tests. 

Results from School A were obtained 


TABLE 1 
Test Score Means AND £ Ratios ror TEST-CONTROL GROUPS IN ScHooL A 


N= 


75) 


Raw test score means by check points 


Check point no. 


1 2 3 4 5 6 7 8 9 
Test group mean 31.1 34.6 19.4 17.8 39.8 23.1 22.1 30.4 36.5 
Control group mean 29.1 31.0 17.7 16.2 37.5 21.0 21.7 29.2 36.0 
t 1.61 2.64* 3.26** 2.78** 3.95** 2.64* 95 1.31 78 
* Significant at .02 level. 
** Significant at .01 level. 
TABLE 2 
Test Score MEANS AND t Ratios ror TEST-CONTROL Grours 1N Scuoon B 
(N = 33) 


Raw test score means by check points 


Check point no. 


1 2 4 5 6 7 8 9 
Test group mean 23.5 21.3 21.6 18.8 22.9 22.6 23.3 23.2 24.3 
Control group mean 21.8 21.0 20.2 17.4 22.6 17.9 18.9 18.9 22.3 * 
1.93.28 1.43 1.88 .36 3.07* 3.45* 3.41* 2.70 
* Significant at .01 level. 
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SELF-EVALUATION AND ACHIEVEMENT 


from a total of 75 paired cases which re- 
mained from the original 100 paired cases. 
Others were lost due to elimination and 
other administrative problems. Students 
m test and control groups were paired by 
aptitude index based on a battery of tests 
Siven at induction centers. 

_Table 1 shows results of all measures 
Slven to both groups in School A. 

All nine measures indicate a positive 
difference in favor of the test group. Five 
of the nine differences show a ¢ which is 
Significant at the one per cent or two per 
cent level. Thus evidence strongly favors 
the test group. If all measures are com- 
bined by means of the use of multiple 
Critical ratio, as suggested by Chapin (1), 
an MCR of 6.40 is obtained indicating a 
highly significant difference in favor of the 
Self-evaluating group. 

_ Results obtained from 33 matched pairs 
in School B are shown in Table 2. 

Results in School B where “normal” use 
Of self-evaluation was attempted, also in- 
dicate a positive difference in favor of the 
test group on all measures. Measures num- 

er eight and nine were not used because 
of improper administration. Four of the 
nine differences show a t which is signifi- 
cant at the one per cent level. An MCR 
of 6.17 also indicates highly significant 
difference in favor of the self-evaluating 
group. 


Summary and Conclusions 


This study was accomplished to deter- 
mine the effect of periodic self-evaluation 
on Student achievement. Students in two 
Military technical schools periodically 
evaluated their skill and knowledge during 
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a course of instruction, under careful con- 
trol of the experimenter in one school and 
as part of the “normal” class activities in 
the second. Achievement of test groups on 
regular school tests was compared with 
that of control groups which were 
matched by aptitude. The results favored 
the self-evaluation group, with multiple 
critical ratios being statistically significant 
in both schools. The results lead to the 
conclusion that in this particular situation 
students, given formal and periodic op- 
portunities to evaluate themselves, can 
achieve to a greater degree than students 
not having such opportunity. 

The study also raises several questions: 
Does a device such as the one used furnish 
additional motivation, sharpen percep- 
tions of the objectives to be achieved, or 
result in better organization of previous 
learning on which future learning is 
based? These and similar questions should 
furnish a basis for future study in the area 
of self-evaluation. 
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AN EXPERIMENTAL EVALUATION OF THE 
OPEN BOOK EXAMINATION 


RICHARD A. KALISH: 


University of Hawaii 


Part of the contemporary pedagogical 
trial-and-error efforts to improve tech- 
niques of classroom testing has focused 
upon the use of the open book examina- 
tion. In such an examination the student 
is allowed to make use of any materials at 
his disposal, including textbooks, lecture 
notes, and dictionaries, but does not ob- 
tain answers either directly or indirectly 
from other students. 

Tussing (2) summarizes the various ar- 
guments for using the open book test as: 


1. The test can be constructed and used 
in all the various forms that the traditional 
test can be used; 2. Much of the fear and 
emotional blocks encountered by the stu- 
dent is removed; 3. Emphasis is placed upon 
the practical problems and reasoning, and 
less emphasis is placed upon pure memory 
of facts and items; 4. Cheating with cribs 
and other devices is eliminated; 5. This ap- 
proach is more adaptable to evaluating 
student attitudes and posing the question of 
what action should be taken on social 
issues. 


Some of the arguments opposing the use 
of the open book should also be recog- 
nized; namely, (a) It is likely to reduce 
study by allowing some students to feel 
that the use of the book will enable them 
to “slide through” with a minimum of 
study; (b) There is some reason to believe 
that a certain amount of rote memory 
may bring about the overlearning so often 
necessary to a full understanding of a sub- 
ject; (c) Note-passing and looking at the 
test paper of a nearby student is made 
easier in the confusion of looking through 
papers and books; (d) A more superficial 
knowledge of the material is encouraged. 

1 The author wishes to express his thanks 


to W. E. Vinacke and John Digman for 
their help with the manuscript. 


PROBLEM 

The present study is an attempt to de- 
termine the equivalence of two approaches 
to the administration of examinations; 
namely, the conventional closed book ver- 
sus the open book. The general hypotheses 
are: (a) The open book examination will 
lead to fewer student errors; (b) The ` 
open book examination will measure dif- 
ferent abilities than those assessed by the 
closed book tests; and (c) There is no 
correlation between student ratings of the 
help received from open book examina- 
tions and their test scores. 

The first hypothesis is based on the ap- 
parent truism that the opportunity to 
look up material at its source should pro- 
vide greater accuracy of response than 
depending upon memory. The null hy- 
pothesis may be stated as follows: An ex- 
perimental group, receiving an open book 
examination, will not differ significantly 
in terms of total errors from a control 
group which receives the same examina- 
tion under the traditional method. 

The second hypothesis is based on the 
assumption that certain individuals will 
do better work on a closed book test while 
others will do relatively better on an open 
book examination, the differences being 
functions of differential responses to the 
pressure of the examination situation, an 
altering of motivation in studying, the 
ability to make organized use of texts and 
notes, ete. Specifically, then, the null hy- 
pothesis will state: The correlation be- 
tween two closed book examinations 
not differ significantly from the correla- 
tion between an open book and a closed 
book examination, assuming the sets of 
examinations and testing conditions aT? Y 
equivalent in every way. 
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OPEN BOOK EXAMINATIONS 


The third hypothesis was based pri- 
marily upon the “educated guess” of the 
Investigator which, in turn, was based on 
casual observations of the grades of stu- 
dents taking open book tests. The null hy- 
Pothesis states: There will be no difference 
m number of errors on open book exami- 
nations between groups of students who 
Tate open book examinations as being 
helpful and those who rate them as being 
nonhelpful. In this case, it is predicted 
that the null hypothesis will be accepted. 


MezrTHoD 


Subjects. The subjects were 158 stu- 
dents at the University of Hawaii, 85% 
ipa and 75% sophomores. Seventy- 
our of the students were enrolled in one 
Section of child psychology, while the re- 
maining 84 were enrolled in another sec- 
tion of the same course with the same in- 
Structor, 
ti Examinations. Two mid-term examina- 
ous, approximately six weeks apart, were 
Stven to both sections. Hach examination 
consisted of 50 questions, all multiple- 
choice items with five alternative re- 
Sponses, only one of which was acceptable 
4S correct. About one half the items were 
: ased on the class lectures; the other half 
‘vere drawn from the text. Included were 


“Items which were distinctly factual in 


tose items which attempted to get at 
wi derstanding of relationships, and items 
VAich measured an understanding of ter- 
minology, Tt is believed that the tests 
cre fairly typical of college tests of the 
Multiple-choice variety. 
he students were allowed 50 minutes 
ted the examination, from the time the 
StS were completely distributed until 
vas time they were collected. The score 
s S the total number of errors, a high 
ore thus indicating a low grade. 
Tocedure. The two sections, meeting in 
ke Same room at successive hours, were 
th en the same six-week examinations on 
© same day at successive hours. Both 
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sections had covered the same material 
during the class periods, had the same as- 
signments, and received the same exami- 
nations. Opportunity for communication 
between sections was eliminated by keep- 
ing the students from the first section 
in the classroom until the end of the ex- 
amination period, then releasing them by 
a back door while ushering the second 
class in through the front. There was 
little if any possibility for passing on 
questions and answers. 

The first section to meet (Class A) was 
given the usual closed book examination 
for both six-week examinations. The sec- 
ond section (Class B) was given the same 
examinations, but only the first examina- 
tion was in normal closed book form. At 
the first class meeting following the exami- 
nation it was announced for the first time 
that the next examination would be “open 
book.” Class B’s second examination was 
taken with the use of textbooks, notes, 
and dictionaries. Otherwise, testing con- 
ditions were exactly the same as for Class 
A. 
Most students in Class A had from 15 
to 20 minutes remaining after completing 
the second examination, so it was assumed 
that most students in Class B should have 
had approximately that much time to look 
for answers in the material available to 
them. 

At the close of the examination period, 
Class B was asked to indicate how much 
help the open book procedure provided 
by writing “None,” “Little,” “Some,” or 
“Much” on their answer sheet. 

Replication. A replication, comparable 
in every way except one, was conducted 
with 161 students, divided into Class A’ 
(N = 79) and Class B’ (N = 82). The 
one way in which the replication differed 
from the original study was that the two 
classes were not held in the same class- 
room and that communication between the 
two groups was not so well controlled. It 
is still highly unlikely that communication 
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study. 
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Resuits 


To test the first null hypothesis, it was 
first necessary to determine whether the 
two sections (Class A and Class B in the 
original experiment, Class A’ and Class B’ 
in the replication) were comparable in 
ability. This was accomplished by com- 
paring their scores on the first examina- 
tion, taken by both groups under the same 
conditions. Since the first and the second 
examinations were not necessarily of equal 
difficulty, the effects of the open book ex- 
amination had to be measured in terms of 
the differences between the sections on the 
two examinations. These data are con- 
tained in Table 1. 

As can be observed in Table 1, the 
scores were relatively the same for Class 
A and Class B (and for Class A’ and Class 
B’) on the first and second examinations. 
Although in each case the Experimental 
Group obtained approximately a one- 
half point relative increase under the ex- 
perimental conditions, the difference is far 
from statistically significant. 

Therefore, Null Hypothesis 1 must be 
accepted. It would appear from the results 
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that, under the given conditions, the op- 
portunity to use text and lecture mate- 
tials resulted in no difference in total 
errors. 

To test the second null hypothesis, 
Pearson product-moment correlations 
were computed for each class. The sig- 
nificance of the difference between cor- 
relations for Class A and Class B (and 
for Class A’ and Class B’) was then 
computed. The obtained correlations and 
their significance of difference levels are 
shown in Table 2. In both the original 
experiment and the replication, the cor- 
relation for the control condition was 
substantially higher than for the experi- 
mental condition (r’s = .691 and .579 
as opposed to .495 and .460), which was 
as hypothesized. 

Although neither difference was sig- 
nificant with the two-tailed ¢ test (t = 
1.90 and 1.00), both were in the expected 
direction. The probabilities of the two 
experiments were combined according to 
the chi square method for independent 
samples suggested by Gordon, Loveland, 
and Cureton (1). The obtained chi 
square was 11.841, which is significant 
beyond the .02 level of confidence with 
four degrees of freedom. 


TABLE 1 
MEAN NUMBER or Errors on EXAMINATIONS 


Mean Number Mean Number 


Errors Errors 
Both Closed Experimental Change 
Book Condition 
(Exam I) (Exam II) 
Experiment 
Control (Class A) 10.24 15.68 44 
Experimental (Class B) 8.23 13.10 Fier 
Difference 2.01 2.58 0.57 
Replication 
Control (Class A’) 13.51 13.41 —0.10 
Experimental (Class B’) 15.56 14.88 —0.68 
Difference 2.05 1.47 0.58 
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TABLE 2 


CORRELATIONS BETWEEN SCORES on First EXAMINATION AND SECOND 
EXAMINATION AND TESTS OF SIGNIFICANCE. BETWEEN CORRELATIONS 


tł Between Level of 
Correlation groups confidence 
Experiment 
Control (Class A) 691 
Experimental (Class B) 495 1.90 .056 
Replication 
Control (Class A’) .579 
Experimental (Class B’) 460 1.00 -316 
Combined walls P< 02 
TABLE 3 


Cuanes IN Numper or Errors From First EXAMINATION TO SECOND 
EXAMINATION FoR THE EXPERIMENTAL Groups as A FUNCTION 
or ATTITUDES REGARDING Oren Boox METHODS 


Extent of Help Received From Open Book Examination 


Much hel; Some hel, Little help No help 

N a chage a change P| a change N change 

Class B 6 +5.16 33 +4.74 26 +4.96 4 +5.50 
Class B’ 9 —1.78 42 +0.12 15 —0.53 1 +2.00 


Therefore, the second null hypothesis 
Was rejected. It appears that a signifi- 
cantly lower correlation is obtained when 
an open book examination is given fol- 
ee a closed book examination than 
‘When both examinations are the closed 

Sok type, 

For the final null hypothesis, the stu- 
ents were asked to indicate whether they 
“lt the open book examination had been 

= “Much,” “Some,” “Little,” or “No” 
ra As may readily be observed in 
able 3, there was virtually no difference 
dient the four groups of students in- 

‘cating the four attitudes towards open 
Ook tests, 

the ot only are the differences among 
those UPS Slight, but for the first study, 

se who felt the open book was “Little 
a did relatively better than those who 
it was “Much” help; in the replica- 


tion a similar reversal occurred between 
“Some” help and “Little” help. 

Therefore, the third null hypothesis was 
accepted, which was in accordance with 
the corresponding general hypothesis. It 
appears that the feelings of students re- 
garding help given by an open book ex- 
amination are not reflected in measured 
grade changes. 


Discussion 


This study has investigated the equiv- 
alence of the open book examination and 
the closed book examination. The results 
have indicated that, although under the 
conditions of this experiment the group 
average scores are not affected by the 
examination approach, the two types of 
examinations measure significantly dif- 
ferent abilities. 

While recognizing the obvious dangers 
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of over-generalizing, it is felt by the au- 
thor that the experimental situation is 
sufficiently “typical” of college e::amina- 
tions of the multiple-choice variety to 
have widespread applicability. 

It can be assumed, therefore, that some 
students do relatively better on open 
book examinations, while others do better 
on closed book examinations. Use of the 
open book technique, thus, would appear 
more rewarding to certain students than 
to others, and the belief that there is 
no difference between the two types of 
examination is of dubious validity. 

Therefore, if an instructor feels, with 
Tussing, that the open book examination 
is a more valid measure due to the de- 
crease in reliance on memory and de- 
traction from cheating, the open book 
approach would be most appropriate. On 
the other hand, if he feels that the closed 
book type provides more study motiva- 
tion and encourages a less superficial ap- 
proach to a course, he will undoubtedly 
adhere to the traditional examination. 

This study has shown that the two 
types of examinations measure signifi- 
cantly different abilities. It will now be 
necessary to investigate what factors dif- 
ferentiate students who are successful on 
each of the types of tests, so that in- 
structor decisions might be based on more 
complete information. 
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SUMMARY 


An investigation was made of the 
equivalence between open book examina- 
tions and closed book examinations. 

Two traditional closed book examina- 
tions were administered to a class of 
University of Hawaii students; the same 
examinations were administered to an- 
other section of students taking the same 
course with the same instructor, differing 
only in that the second examination was 
open book. A replication is also reported. 

Three hypotheses were tested: 1, The 
open book examination will lead to fewer 
student errors; 2. The open book exam- 
ination measures different abilities than the 
closed book examination; 3. Student rat- 
ings of the help received from open book 
examinations will not be related to ex- 
amination scores. The first hypothesis was 
not substantiated, but the second and 
third hypotheses were verified. 
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THE STANDARD ERROR OF MEASUREMENT 
OF THE DIFFERENCE BETWEEN A SUM 
SCORE AND ONE OF ITS PARTS 
FREDERICK B. DAVIS 
Hunter College 


„For proper interpretation of an indi- 
Vidual’s test scores it is sometimes neces- 
Sary to ascertain the significance of the 
difference between a total score, consisting 
of the sum of several part scores, and one 
of those part scores. For example, the 
Level-of-Comprehension score from the 
Davis Reading Test (2) is based on the 
first 40 of the 80 items that determine the 
Speed-of-Comprehension score. A differ- 
ence between individual speed and level 
Scores should be evaluated in terms of the 
Standard error of measurement of the 
difference between a sum (the Speed-of- 
Comprehension score) and one of its parts 
(the Level-of-Comprehension score). 

For the convenience of clinical and 
School psychologists, the equations for 
Computing the standard error of a differ- 
ence between overlapping total and part 
Scores obtained by an individual drawn at 
random from a specified group will be pre- 
Sented first and their use illustrated with 
data from the Davis Reading Test. The 

€tivation of these new equations will 
then be provided. 


PRACTICAL PROCEDURES 


Let T represent an jndividual’s raw 
Score made up of m parts. Then T = A+ 
@ + -+-+ M. Let I represent any part of 
um T, and P any part of sum T except I. 
ifferences between sum T and any one 
“i its parts are inconvenient to interpret 
na all of the scores are made compar- 
i e. For purposes of this discussion, 

Mparable scores are defined as trans- 


à md raw-score values for which the 
Orresponding true-score points are ex- 
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ceeded by the same percentage of examinees 
in a defined sample. The desired raw-score 
values may be determined by the method 
given by Flanagan (3, pp. 752-760). 
They are transformed simply to make 
numerically identical comparable scores 
for which corresponding true-score points 
are exceeded by the same percentage of 
examinees in the defined sample. For 
example, for Form A of the Davis Reading 
Test a speed score of 31 and a level score 
of 21 are raw-score values for which the 
corresponding true-score points are ex- 
ceeded by about 61 per cent of the exa- 
aminees in the equating sample (which 
comprised 4,692 students in Grades 11 and 
12 and the freshman year of college). 
These two raw-score values have been 
transformed into comparable scores of 75. 
It should be made clear that comparable 
scores, as defined above, are not neces- 
sarily measures of the same abilities or 
equally reliable. 

Fortunately, total and part scores are 
often expressed in serviceable approxima- 
tions to comparable scores. For example, 
Verbal, Performance, and Full-Scale IQ’s 
from the Wechsler intelligence scales 
are expressed in units such that their 
means are approximately 100, their 
standard deviations about 15, and the 
shapes of their distributions nearly normal. 
Similarly, total and part scores from the 
Cooperative Achievement Tests are ex- 
pressed in Scaled Scores that have means 
and standard deviations (in a defined 
hypothetical group) of 50 and 10, respec- 
tively, and distributions that are closely 
normal. 
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Suppose that the raw scores (T, A, 
B, ---, M, as well as I and P) of the 
individual mentioned above are expressed 
in comparable form and denoted Zr, 
Za, Zs, +++, Zm ; and that Zr denotes a 
comparable score on any part of sum T 
and Zp any comparable part score except 
Zı. Then it should be noted that Zr = 
Za + Zs + +--+ + Zy. The standard 
error of measurement of the difference 
between Zr and Zr may be written as: 


mol (1) 
yZ sp?(17— rpp) 


Smeas ezp- 21) 


S(z,.-%) 


8-1) 
or, 


Smeas(Zp-Z,) 


ii ya (Ces) 2] 
Sm- Sz, 
or, 
Smens(Z.-Z,) = SZp-Zp) A/T arene [3] 
where: 
Saq-2,) = Vsa + 822 — 257,57 Tri [4] 
Sap = Vat + 82 — 2er rm [5] 
Pe ein ms Sri + 820 — 23781 irr (6) 
ST? + 8È — 2sp8,rqq 
and 
Toye = rm — Sh — ih) = En) [7] 


Sr 


In the preceding equations, ST’, si, 
and s? and rer, riv, and rpp are the 
variances and reliability coefficients, respec- 
tively, of these variables in the original 
raw-score units of measurement. The 
correlation of sum scores and any given 


FREDERICK B. DAVIS 


set of part scores, expressed in these 
original units of measurement, is denoted 
as rrr. Variances of the transformed 
comparable scores are denoted as sz’, 
sz, and szp. 

Whether the difference Zp — Zr for 
any pupil chosen at random from the 
group tested may be regarded as a chance 
deviation from a true difference of zero 
at any desired level of confidence may be 
determined with serviceable accuracy by 
means of the critical ratio: 

_ @-W-0 


Smeas (Zq-Zy) 


CR [8] 


Choice among Equations [1], [2], and 
[3] for computing the standard error of 
measurement of a difference depends on 
which one can be employed most conven- 
iently with the data available. To deter- 
mine the standard error of measurement 
of the difference between the speed and 
level scores from the Davis Reading Test 
for a college freshman drawn at random 
from a group tested, Equation [1] is most 
convenient. The test results give the 
standard errors of measurement of these 
scores, in terms of the original raw-score 


units of measurement, as 5.5 for speed | 


and 3.7 for level. Equation [12], therefore, 
yields (5.5)? — (3.7), or 16.56, as the 
numerical value under the radical sign iD 
Equation [1]. Numerical values of the 
terms in Equations [4] and [5], also given iD 
the manual, lead to a value of .35 for the 
ratio of Szr-z) to ser-y. The standard 
error of measurement of the difference 
turns out to be 1.4. When this value 1$ 
used in Equation [8], a difference of 


5 


points is found to be significant at about | 


the 15 per cent level and one of 3 points 
at about the 3 per cent level. For a colles® 
student drawn at random from the grouP 
tested, a counselor or teacher would 


justified in concluding that a differen? . 


between his speed and level scores © 
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r points or more should be attributed to 
= Causes other than chance. It may occasion 
‘Some surprise to find that the standard 
= error of measurement of the difference 
= between these two comparable scores is so 
small. This is largely accounted for by the 
fact that errors of measurement in them 

are positively correlated. 


DERIVATION or EQUATIONS 


Let T represent a raw total score made 
" E of m parts: A, B, ---, M. Then T = 
~ A+B +... + M. Let I represent any 
Part of T and let P represent any part of 
: T except I. Assume that an indefinitely 
Tge number of parallel forms of the 
ae from which these raw scores are 
derived are given to a pupil drawn at 
Tandom from a grade group for which the 
a are appropriate and postulate that 
18 pupil’s true scores in the abilities 
g paed remain constant throughout the 
T Sting. An essentially normal distribution 
x differences between T and I would 
in be obtained. The mean of the distri- 
ution would approach Te — Ie (the 
ference between the pupil’s true scores), 
and its variance could be written as: 


8 2 
(Tp? = 8, 2 2 2 
(T+T,)-A,41) = Fry + 5, 
+ 8p? +4 sr? — 28p 81, Pry 
© e E ETES 


Al + 287 Sr, Tor, + 2s1,S1, Thr, [9] 


— 2 = Sı r 
A I 2er, e Tele 
- 2sp S1, Trt, 


Where the subscript ¢ denotes a true score 


a p 
nd the subscript e an error of measure- 
Ment, 


Sine we postulated that the pupils 
Scores remain constant, Sr," İS equal 
i 812 is equal to zero. Consequently, 
` “quation [9] may be simplified to: 


8, 
A mease p? a Sr? i Sr ii 25r, S1 TT, [10] 


It can easily be shown that the coeffi- 
cient 


(1) 


I 

Tr, = Ta-tpa-) = z 

By definition, ss? is equal to Smeas? and 

sx? is equal to Smeas,. Therefore, Equa- 
tion [10] may be simplified to: 


(12) 


Smonsyp_n? = Smeasy? — Smeas,? 


If we make the usual assumption that the 
correlation of errors of measurement of 
separate tests will, under proper conditions 
of test administration, be zero, we may 


write: 
Smeas? = Sr? = Sia +B yH? 
= sa? + S8? + Hw? +0 [13] 
o o e 
m-l 
= D Smeas,” + Smoas;? 
If a substitution is made for Smeas,” in 


Equation [12], we obtain: 
m-l 
Smease- = E Smoasp + Smeas;” 


[14] 


m-l 
— Smeas? = È Smeas? 


If sum T and each of its parts are 
transformed into comparable scores, as 
defined previously, we obtain Equations 
{1} and [2] by multiplying each side of 
Equation [14] by 

824-2)" 


Sqr-1* 


and substituting 


for Smeasp*. 


Equation [3] is a specific application of 
the well-known relationship: 


Smeasy = SX V1 — rxx 
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Equations [4], [5], and [6] are well known 2. Davis, F. B & Davis, C.C. Davis Reac 
and the derivation of Equation [7] has ing Test. Series 1, for High School an 


k z College Students. Forms A, B, C, and L 
been published by the writer (1). + New York: Psychological Corp., 1957. 


3. FraNacan, J.C. Units, scales, and norm: 

TEEN OES In E. F. Lindquist (Ed), Hducationa 

1. Davis, F. B. Note on part-whole correla- measurement. Washington: American 
tion. J. educ. Psychol., 1958, 49, 77-79. Council on Edueation, 1951. 
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A NOTE ON SEX EQUALITY IN THE INCIDENCE 
OF LEFT-HANDEDNESS! 


a WAYNE DENNIS 
Ee Brooklyn College 


i It is commonly accepted that compar- 
isons of behavior in different cultures may 
provide data which are decisive for psy- 
chological theories. Yet the number of 
cross-cultural studies which have been 
used to test hypotheses is small. The pres- 
ent study may provide an additional dem- 
Onstration of the value of cultural com- 
parisons, 

As background data let us note that 
reviews of data on handedness such as 
those by Wile (1) and Hildreth (2) show 
that while the frequency of left-handedness 
varies from activity to activity, in almost 
all studies the use of the left hand is more 
common among males than among females. 
The number of left-handed males have 
been found to exceed the number of left- 
handed females by 50% or to an even 
Sreater amount. This difference is present 
at least by four years of age and perhaps 
earlier, 

The consistency of this finding suggests 
that this sex difference may have a bio- 
logical basis. However, practically all of 
the investigations of hand preference have 
been conducted in Europe and America. 
For this reason the possibility exists that 
the lower frequency of left-handedness 
among women than among men is not 
biologically determined but rather that it 
may be a consequence of stronger social 
Pressures against the use of the left hand 
among females than among males in west- 
ern countries. In view of these rival in- 


1 + 
sia his study was conducted while the 
ican or Was a visiting professor at the Amer- 
inv University of Beirut. Expenses of the 
he ation were defrayed by a grant from 
b dan Ockefeller Brothers Fund, Adele Ham- 
f Taky Din and Leila Biksmati served as 
ch assistants. 
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terpretations of the data just reviewed, it 
seems worthwhile to report that in one 
Near Eastern country and probably in a 
much wider area the sex difference in hand- 
edness found in western countries does 
not exist. 

The present study was conducted in 
the schools of Beirut, Lebanon, in 1955-56. 
Eleven schools were studied. In each class- 
room, a research assistant observed the 
handedness of children when they were 
engaged in writing in connection with their 
usual school work. In each school all grades 
from the kindergarten through Grade 5 
were observed. Some of the schools were 
coeducational, others were not. In the case 
of noncoeducational schools, an attempt 
was made to match boys’ schools and girls’ 
schools with respect to socioeconomic class 
and religious affiliation. 

A total of 2,656 pupils, 1,430 boys and 
1,226 girls were observed. The frequency 
of left-handedness was found to be 5.0% 
among the boys and 4.9% among the girls. 
The small difference is statistically insig- 
nificant at the 5% level. 

There is no reason to suppose that there 
are biological differences between Western 
and Near Eastern populations which would 
affect sex differences in handedness. The 
explanation of the difference between the 
finding just reported above and earlier 
findings probably is to be found in dif- 
ferences in social norms and child rearing 
practices. It seems likely that in the Near 
East left-handedness is no more repre- 
hensible in women than it is in men. How- 
ever, the precise attitudes and cultural 
conditioning in respect to handedness in 
this area can be identified only by further 
research. 


209 


210 WAYNE DENNIS 


In further investigations it will be de- 
termined how widespread among Near- 
Eastern peoples is the sex equality in 
handedness ratios found in Lebanon. 

This study suggests that it may be pos- 
sible for a society to produce more sin- 
istrality among females than among males. 
Whether such a society can be found re- 
mains to be determined. 


REFERENCES 


1. Wiz, I. S. Handedness: Right and left. 
Boston: Lothrop, Lee and Shepard, 
1934. 

2. Hivprerx, Gertrude. The development 
and training of hand dominance. J 
genet. Psychol., 1949, 75, 197-220, 221- 
254, 255-275, 1950, 76, 39-100, 101-144. 


Received April 4, 1958. 


Jounna or Epvcarionat PsyonoLocy 
Vol. 49, No. 4, 1958 


INSTRUCTOR EFFORT TO INFLUENCE: AN EXPERIMENTAL 
EVALUATION OF SIX APPROACHES! 


E. PAUL TORRANCE 


Bureau of Educational Research, University of Minnesota 


AND 
RAIGH MASON 
L. G. Hanscom Air Force Base, Massachusetts 


Instructors 


Yih are frequently confronted 


Sii Problems concerning the extent to 
ich they should attempt to influence 
pos im their attitudes and other be- 
ke p ig strong emotional over- 
t echni hey are uncertain about what 
ques of influence are legitimate and 
a to which they should be per- 
rent] es Some educational leaders are cur- 
eas calling for teachers to be more per- 
* is In their efforts to influence pupil 
Siika Vior. Others maintain that high pres- 
© methods cause resistance or that such 


qs do violence to our democratic 
eals, 


Although 
tors readily 
titudes and 
emotional 
have h 
uence 


most supervisors and instruc- 
admit the importance of at- 
other behaviors having strong 
overtones, teachers generally 
een reluctant to attempt to in- 
hey oe behaviors. Especially have 
tempts Tunk from direct influence at- 
attitud, There has been a fairly pervasive 
Social © that a student’s personal and 

" Attitudes, emotional reactions, and 


1 . 

arpas report is based on work done under 
in gy; 'Oject No. 7723, Task No. 77461, 

PPort of the research and develop- 
anq ter °gram of the Air Force Personnel 
Force pag Research Center, Lackland Air 
for re ase, Texas, Permission is granted 
use, p Production, translation, publication, 
‘sposal in whole and in part by 
United States Government. The 
h rein S or conclusions expressed or implied 
are those of the authors. They need 


r for the 


Not 

the H construed as necessarily reflecting 
Meng oS Or endorsement of the Depart- 
R Air Force or of the Air 


Sean S 
ch and Development Command. 
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the like are his personal business. This 
has been especially true in regard to 
matters affecting physical and mental 
health, eating, sleeping, sexual behavior, 
and the like. Seldom have such matters 
been chosen for scientific investigation 
and appropriate research situations have 
not been readily accessible. 

Little or no scientific information exists 
to guide the decisions instructors and 
supervisors must make in deciding what 
kinds of effort to influence should be 
made. In social psychology, attention has 
been focused upon influence among group 
members (5), the influence of group 
norms (1), and the influence of associ- 
ates in buying, polities, and the like (4). 
In the sales field, much has been said 
(2) about “low-pressure” selling in con- 
trast to “high-pressure” selling. More 
recently, “no-pressure” selling appears to 
be coming into prominence (2). Little 
scientific research of an experimental na- 
ture has accompanied these trends, how- 
ever. 

One difficulty which has hampered re- 
search concerned with emotional reac- 
tions has been the unavailability of sat- 
isfactory criteria. Too frequently, it has 
been necessary to accept verbal expres- 
sions concerning such reactions. Even 
when it has been possible to obtain other 
behavioral measures, there has been doubt 
concerning the “real” emotional reaction 
behind the overt behavior. The authors 
have been fortunate in having access to 
a situation which provided a variety of 
criteria, including verbalized attitudes, 
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overt behavior, and an indicator of emo- 
tional response. The experimental situa- 
tion involves the use of a survival ration 
commonly known as pemmican in the 
simulated survival exercise of the USAF 
Survival Training School. Use of the ra- 
tion almost always elicits a wide range 
of response from extremely unfavorable 
to extremely favorable. Since the ration 
is recognized by most authorities in the 
field as the best available one for use 
in most survival situations, its use in 
training should increase its acceptability 
and, in fact, does (9). 

An earlier study by the authors (11) 
gave a somewhat discouraging picture 
of the instructor’s ability to influence the 
acceptability of the ration. When given 
scientifically developed information about 
the psychological, social, and training 
factors related to the ration’s accept- 
ability and asked to use this information 
on behalf of their crews, aircrew com- 
manders (indigenous leaders) were far 
more successful than the crew instructors 
(11). Furthermore, those instructors who 
made the most effort to influence ac- 
ceptability (as measured by statements 
made by both the instructor and the 
trainees) tended to obtain the lowest 
acceptability. Sustained efforts by indig- 
enous leaders, however, were rewarded 
by increased acceptance. 

Instructors in this and other situations 
frequently must face the very realistic 
problem of influencing the attitudes and 
emotionally toned behaviors of their stu- 
dents. Thus, it is evident that there is 
a need for a clearer understanding con- 
cerning what it is that instructors do 
which produces negative effects and what 
they can do to exert more positive in- 
fluence. The purpose of the present study 
was to evaluate experimentally six al- 
ternative procedures by which training 
instructors may influence the acceptabil- 
ity of pemmican. 
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PROCEDURES 


The Ss of the study were 427 aircrew- 
men undergoing survival training. All Ss 
received a double issue of the emergency 
ration consisting of a total of eight meat 
bars (pemmican) supplemented by chili 
and onion powder, two cereal bars, two 
fruitcake bars, 16 cubes of sugar, and 
eight packets each of soluble coffee and 
tea. During the nine-day simulated sur- 
vival, escape, and evasion exercises, train- 
ees were able to supplement these rations 
to some extent by such native foods as 
porcupine, crawfish, wild onions, water 
cress, camus, and the like. 

A total of 43 instructors in the two 
successive classes were involved. Prior to 
the exercise, the training groups (crews 
consisting of 9 or 12 men each) were 
divided randomly into one control and 
six experimental groups. In each class, 
three training groups were involved in 
each of the experimental groups. In the 
first class, four groups were assigned to 
the control condition and in the second, 
three groups. 

Instructors were briefed by three oxi 
perienced psychologists thoroughly famil- 
iar with survival ration indoctrination 
and other aspects of the program of the 
USAF Survival Training School. The gen- 
eral purposes and design of the study 
were explained briefly. The instructors 
were asked to forgo their usual indoc- 
trination procedures and use only the 
technique they would be assigned. In- 
structors then met in groups of three oF 
four, as the case might be, with one of 
the experimenters to discuss the technique 
to which they had been assigned. Prior 
to the discussion, each instructor. com- 
pleted a questionnaire in which he in- 
dicated his personal reaction to the ratio? 
and described his usual indoctrinatio? 
procedures. Each instructor was als? 
given a typed sheet of instructions to be y 
used as a guide in carrying out his a 
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signed technique. The members of the 
control group were subjected to only the 


normal influences of the training situa- 
tion. 


The six experimental conditions may be 
described briefly as follows: 


Experimental 1 (No Influence). Instruc- 
tors were briefed to make no effort to in- 
fluence trainees to accept the ration. They 
Were instructed to say as little about it as 
Possible, assuming a rather neutral stand 
on acceptability. They were cautioned, how- 
ever, to avoid giving any impression of 
Personal dislike for the ration, (This con- 

ition was designed to follow up a clue 

obtained from an exploratory study which 
indicated that trainees Perceiving “no in- 
uence” attempts on the part of their 
Istructors responded more favorably than 

ose perceiving various degrees of efforts 
to influence.) 

“zperimental 2 (Good Example). In- 
Structors were briefed to make no direct 
attempts to influence trainees to accept 
Pemmican, They were issued a supply of 

< ration and instructed to make a defi- 
nite attempt to manifest a definitely favor- 
able attitude by personal example. This 
Was done by eating the ration and casu- 
ally expressing favorable reactions to it. 

ey were cautioned to make no appeal 
to the trainees to eat the ration. (This 
condition was designed to evaluate the 
effectiveness of the often used admonition 
ko instructors to “set a good example” and 
Never ask your students to do anything 
at you do not do.”) 

Experimental 3 (Information). Instructors 
Were asked to give information about the 
Value of the meat bar as an emergency 
Tation and about ways of preparing it. They 
Were instructed to give this information 
in an objective, factual, “take-it-or-leave- 
a manner and to give no information 
out Psychological reactions. (This con- 
cation was designed to evaluate what was 
“Nsidered a “low-pressure” technique of 

fluence.) 
addit a mental 4 (Group Explanation). In 


Ways of Bes giving facts about values and 
O 


š: reparation, instructors were asked 
. Sui i 
Which 


Phasize the Psychological factors 
Why „efect Acceptability and to explain 
trainin 'S Particular ration is used in the 
Psychol exercise, The information about 
z logical influences was derived from 


E 
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previous research by the authors (8, 9, 10), 
(This condition was designed to test the 
value of using information gained through 
research to provide a Psychological ex- 
planation of behavior.) 

Experimental 6 (Individual Explanation). 
Instructors were briefed as in Experimental 
4 except that they were asked to work 
with individuals as individuals instead of 
with the group. They were asked to appear 
natural and casual, but sincere, in their 
attempt to exercise personal influence. 

Experimental 6 (Evaluation). Instructors 
were briefed to use the mildly coercive 
method of informing trainees that they 
would be “graded down” if they did not 
“really” try the ration, They were instruc- 
ted to explain that failure to eat the ration 
was an indication of poor “will-to-survive,” 
failure to take adaptive action, failure to 
take care of essential survival needs, failure 
to “play the game,” etc. (This condition 
was designed to evaluate the effectiveness 
of using evaluation as a device for mo- 
tivating or influencing behavior.) 


Following the field exercise, 
administered a questionaire 
measures of acceptability and to provide 
additional facts concerning the conditions 
existing during the experiment, Accept- 
ability items included: (a) the traditional 
hedonic scale (7-point), Tequiring the § 
to indicate his reactions to each of five 
methods of preparing pemmican ; (b) the 
number of bars eaten; (c) reasons for not 
eating the remainder (made me sick, too 
greasy, smells bad, ete.) ; and (d) the con- 
ditions under which the S would use pem- 
mican in the future. Previous research 
(10) had indicated that each of these 
items correlates significantly with and con- 
tributes importantly to an over-all index 
of rejection. 

This over-all index of rejection was ob- 
tained by combining the items in the fol- 
lowing manner. The ratings from the 
hedonic scales were weighted from one 
point for “like extremely” to seven points 
for “dislike extremely.” If g indicated 
that he had not tried the bar according to 
one or more methods, the mean rating for 


all Ss were 
to obtain 
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the methods tried was assigned. One point 
was scored for each bar not eaten but 
no extra credit was awarded for gating 
more than the number of bars issued. Re- 
ports of having “been made sick” added 
five points and each of the other reasons 
for not eating the remainder of the bars 
was scored one point. Five points were 
added for responses of “would eat only 
when extremely hungry” and 10 points for 
“would not eat even if very hungry.” 


RESULTS AND CONCLUSIONS 


First, an effort was made to determine 
the over-all effects of the six experimental 
influence techniques. Means and standard 
deviations for the Rejection Index and 
number of meat bars consumed and num- 
bers and percentages for “made sick” and 
intension to eat the ration in the future 
“whenever hungry” for each condition are 
shown in Table 1. Using Bartlett’s test, 
the requirements for homogeneity of vari- 
ance are not met in the case of both the 
Rejection Index and number of meat bars 
consumed. Over-all chi squares indicate 
significant differences among the various 
conditions for both “made sick” and in- 
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tension to eat the ration in the future 
“whenever hungry.” 

The heterogeneity of variance for Re- 
jection Index and number of meat bars 
consumed seems to be due primarily to 
the small dispersion in Experimental 6. 
Table 2 presents the F ratios between Ex- 
perimental 6 and each other condition. It 
will be noted that all of the F ratios are 
significant at the .05 level or better for 
Rejection Index. Only the F ratio between 
Experimental 2 and Experimental 6 fails 
to reach significance for number of meat 
bars consumed. In general, then, it may 
be concluded that all of the conditions are 
more erratic in their effects than Exper- 
imental 6. 

Using the method described by Edwards 
(3, pp. 272-274) to correct for variance, 
direct tests were made to compare the 
means for Experimental 6 with the means 
for each other condition. The ¢ ratios thus 
obtained are presented in Table 2. Using 
Rejection Index as the criterion, Exper- 
imental 6 appears to produce greater 
acceptability than the other conditions 
except Experimental 4 (Group Expla- 
nation). Using number of meat bars con- 


TABLE 1 


ONS oF REJECTION ĪNDEXES AND NuMBER oF 
Mear Bars CONSUMED AND 


TO Use BAR IN 


PERCENTAGE MADE SICK AND INTENDING 
FUTURE ror EACH CONDITION 


(EXPERIMENTAL AND Conrrot) 


Rej. Index Bars Consumed | Made Sick | Eat in Fut. 
Condition No. 

Mean SD* | Mean | SD* | No. | Petg.>| No, Petg.° 

Control 76 26.58 | 11.43 | 7.22 | 3.02 | 8 10.5 2 
Exp. 1 (No infl.) 57 29.23 | 12.60 | 5.66 | 3.07 | 15 26.3 = ani 
Exp. 2 (Good Ex.) 63 32.92 | 12.04 | 5.66 | 2.34 | 16 25.4 |18 | 28.6 
Exp. 3 (Info.) 62 27.73 | 12.93 | 7.95 | 4.60 | 11 17.7 | 38 | 61.3 
Exp. 4 (Grp. Expl.) 61 25.61 | 11.54 | 6.75 | 2.94 | 17 27.9 | 32 | 52.5 
Exp. 5 (Indiv. Expl.) 65 31.92 | 12.96 | 5.57 | 5.28 | 19 29.2 | 17 | 26.2 
Exp. 6 (Evaluation) 434 | 21.95 | 8.57] 7.79 1.17] 3] 7.0 24 | 55.8 

® Using Bartlett’s test, requirements for 


P Chi square = 16.759; df = 6; p < .02 
© Chi square = 27.227; df = 6; p < .01 
d Number of Ss in Exp. 6 was reduced b 


instructor after the beginning of the experiment. 


y eliminating one crew whose instructor 


homogeneity of variance not satisfied (p < 001) 


was replaced by an unbriefe y, ĉj 
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TABLE 2 
F Ratios anp ¢ Ratios BETWEEN EXPERI- 
MENTAL 6 AND EACH OTHER CONDITION 
FOR REJECTION INDEX AND NUMBER 
or Mear Bars CONSUMED 


Rejection Index | Bars Consumed 
Condition p 
F ratio | ¢ ratio® | F ratio | ¢ ratio? 
Control 2.159 | 2.488 | 1.834] 1.14 
Exp. 1 (No | 1.968 | 3.408 | 1.939] 3.928 
infi.) 
ERP: $ (Good 2.269 | 5.432 | 1.12 | 4.616 
Ux. 
Exp. 3 (Info.) | 1.80» | 2.73» | 4.334 0.23 
Exp. 4 (Grp. 2.278 | 1.84 | 1.774] 2.008 
xpl.) 
Exp. 5 (Indiv. | 1.76» | 4.778 | 5.719] 2.715 
Expl.) 


è Significant at the .05 level or better. 
_” Corrected for variance according to the method de- 
scribed by Edwards (3, Pp. 272-274). 


Sumed as the criterion, Experimental 6 
achieved results significantly superior to 
all conditions except Experimental 3 (In- 
formation) and the Control Condition. 

Further direct tests were made by com- 
Paring the Control Condition with each 
other condition. As already shown, Exper- 
imental 6 achieved significantly better 
results than the Control Condition using 
the Rejection Index as the criterion. Ex- 
Perimental 2 (Good Example) and Exper- 
mental 5 (Individual Explanation), how- 
ever, appeared to produce significant 
Negative effects (t ratios = 3.154 and 
2.605 Tespectively, both significant at bet- 
ter than the 05 level). 

When direct tests are made by applying 
G i-square analysis to the “made sick” cri- 
terion, the results are similar to those ob- 
tained for number of bars consumed. 
Experimental 6 again is superior at the 
95 level or better to all conditions except 
the Controls and Experimental 3. Using 
tention of using the ration in the future 
aS the criterion, Experimentals 3 and 4 


Produce results on a par with Experi- 
Mental 6, 
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Several features concerning Experi- 
mental 3 (giving objective information 
about the value of the ration and deserib- 
ing methods of preparation) need to be 
noted. This experimental condition was 
accompanied by slightly higher mean con- 
sumption and willingness to eat the ration 
“whenever hungry” than even Experi- 
mental 6. The differences, however, fall 
far short of statistical significance. Using 
the latter criterion to compare Experi- 
mental 3 with the Controls, however, re- 
sults obtained by Experimental 3 are su- 
perior (chi square = 7.31, significant at 
better than the .01 level). Experimental 
3, however, appears to be quite erratic 
in its effects as indicated by the rela- 
tively large standard deviations of Re- 
jection Index and number of meat bars 
consumed. 


Discussion 


In interpreting the results of this study, 
inescapable difficulties of experimental re- 
search in this area need to be made ex- 
plicit. First, the Ss under the control 
condition cannot be regarded as “uns 
trained.” They were subjected to varying 
degrees and kinds of influence. Question- 
naire responses indicated that all of the 
control instructors conducted indoctrina- 
tion concerning survival rations. It might 
even be argued that instructors in the ex- 
perimental conditions were unpracticed 
and perhaps unskilled in the techniques 
which they were asked to use. It is cer- 
tainly not contended that the instructors 
of the experimental groups were perfect in 
their adherence to the technique assigned. 
Nevertheless, the checks made indicated 
reasonable adherence to the assigned con- 
dition. 

In general, the results of this study 
support the leads obtained from the pre- 
vious studies to which reference has been 
made. It is interesting to note that the 
two methods having significant b 


oomeran; 
effects are those relying most i 


heavily on 
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Personal influence. A number of expla- 
nations might be advanced. The expla- 
nation which best satisfies the authors is 
that the boomerang effect resulted from 
the phenomenon of “negative identifi- 
cation” discussed by Torrance and Ziller? 
in an earlier paper. According to this ex- 
planation, trainees perceive instructors as 
different from themselves and different in 
ways which prevent close identification. 
The trainee is a member of an aircrew, 
The instructor is not. He is an “earth- 
bound” man. The instructor is something 
of a woodsman and is comfortable in the 
out-of-doors; usually, the trainee is not 
and frequently cannot imagine any “nor- 
mal” person as being. The instructor is 
relatively young and in outstanding phys- 
ical condition; usually the trainee js older 
and in comparatively poor physical con- 


Tecommended by the instructor, These 
two techniques may also be regarded as 
“indirect” i 


employed in Experimentals 3 and 6. 
The experimenters’ first impulse upon 
examining the results concerning the su- 


them. Every attempt, of course, had been 
made in advance to maintain as rigorous 
controls as possible. The sampling, the 
indoctrination of instructors, and the col- 
lection of the criterion data had been ac- 
complished as carefully as possible. The 
instructors did not see the completed 
blanks and Ss were not Tequired to sign 
their names, so there was little chance of 
threat to the trainee, Someone suggested, 
*Torrance, E. P, & Ziller, R. C. Nega- 
tive identification in groups as a function of 
personality differences, Reno, Nevada: 
Survival Methods Branch, Air Force Per- 
sonnel and Training Research Center, Stead 


Air Force Base, March 1956. (Laboratory 
Note CRL-LN-210.) 
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however, that trainees in Experimental 6 
might have buried or destroyed some of 
their bars in order to make a good impres- 
sion upon the instructor, since he was 
grading them on their use of it. Upon in- 
vestigation, however, it was ascertained 
from independent witnesses that some of 
the crews in Experimental 6 had ex- 
hausted completely their supply of the 
ration and had bartered additional bars 
from other crews. For example, one crew 
bartered 33 additional bars and another 
20 from other crews which did not con- 


essentially the same results, 

Again, a number of alternative ration- 
ales might be advanced to explain the 
superiority of Experimental 6, Some 
might argue that men in our culture have 
been conditioned to respond favorably to 
this mildly coercive technique. If this 
were the only explanation, however, one 
would expect more evidence of “behavior 


It should not be concluded that in- 
structors should avoid the “good example” 
and other techniques of personal influence. 
According to our interpretation, however, 
such techniques are likely to boomerang 
if trainees identify negatively with the 
instructor. Even Experimental 4 (Group 


$s 
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Explanation) is probably influenced by 
this phenomenon. The rationale for Ex- 
perimentals 4 and 5 was taken from 
Stefansson’s experiences in indoctrinating 
members of his exploration parties con- 
cerning “arctic hysteria” (7). He main- 
tained that newcomers to the Arctic were 
simply not bothered by “arctic hysteria,” 
if they were given a satisfactory explana- 
tion of its psychological basis. It is likely 
that these young explorers identified 
Strongly with Stefansson, accepted his 
explanation and were influenced by it. 

The findings are probably applicable to 
situations in which instructors need to 
influence attitudes and other behaviors 
with strong emotional overtones. In gen- 
eral, it would appear that instructor at- 
tempts to influence should be of the 
direct, “take-it-or-leave-it” variety and 
should be made in the instructor’s “official” 
rather than “personal” role. Although the 
influence of associates may be far stronger 
than that of instructors, the findings of 
this study do suggest that instructors may 
play significant roles in influencing atti- 
tudes and other behaviors having strong 
emotional overtones and that this can be 
a fruitful area of research. It is possible 
that the findings of this study can be 
generalized to other conceptually similar 
Situations where it is desirable to influence 
attitudes and behavior, particularly in ed- 
Ueational situations. It is also likely that 
Some of the findings may apply to influence 
Situations in such activities as selling. The 
findings are quite in accord with theories 
which have been developed in the past 
decade concerning the superiority of “low- 
Pressure” sales techniques. Naturally, all 


of these findings need to be tested in other 
Situations, 


SUMMARY 


A sample of 427 aircrewmen partici- 
Pating in a survival exercise were divided 
randomly into seven groups (six exper- 


imentals and one control). Crew instruc- 
tors of the experimental crews were re- 
quested to conduct the survival-ration 
indoctrination according to specific in- 
structions. Using four criteria of accept- 
ance of the ration, an experimental con- 
dition making the food indoctrination a 
regular part of the training accompanied 
by evaluation tended to produce superior 
results. Promising results were also ob- 
tained from a “low-pressure” technique 
relying chiefly upon objective information 
and straight-forward instructions concern- 
ing preparation. Significant negative ef- 
fects were obtained from conditions rely- 
ing upon personal persuasiveness, setting 
an example, and the like. 


REFERENCES 


1. BerKowitz, L. Group norms among 
bomber crews: Patterns of perceived 
crew attitudes, “actual” crew attitudes, 
and crew liking related to aircrew 
effectiveness in Far Eastern combat. 
Sociometry, 1956, 19, 141-153. 

2. Bursx, E. C. Thinking ahead: Drift to 
no-pressure selling. Harvard Bus. Rev., 
1956, 34, 25-32f. 

3. Epwarps, A. L, Statistical methods for 
behavioral sciences. New York: Rine- 
hart, 1954. 

4. Karz, E. & Lazarretp, P. F. Personal 
influence. Glencoe, Ill.: Free Press, 
1955. 

5. Linpsey, G. (Ed.) Handbook of social 
psychology. Cambridge: Addison-Wes- 
ley, 1954. 

6. McNemar, Q. Psychological statistics, 
(2nd ed.) New York: John Wiley. 
1954. i 

7. Sreransson, V. Arctic manual. New 
York: Macmillan, 1953. 

8. Torrance, E. P. Training factors affect- 
ing survival ration acceptability, In 
Conference Notes, Food Research and 
Development Coordination Confer- 
ence, Wright-Patterson Air Force Base 
Ohio, 9-10 October 1956. Chicago. 
Quartermaster Food and Container 
Institute for the Armed Forces, 1957 
Pp. 74-90. k 


218 E. PAUL TORRANCE AND RAIGH MASON Í 


9. Torrance, E. P. Sensitization versus vival ration acceptability. J. clin. Nutr, i 
adaptation in preparation for emer- 1957, 5, 176-179. 
gencies: Prior experience with, an 11. Torrance, E. P, & Mason, R. The in- 
emergency ration and its acceptability digenous leader in changing attitudes 


in a survival situation. J. appl. and behavior. Int. J. Sociometry, 1956, 
Psychol., 1958, 42, 63-67. 1, 23-28. 


10. Torrance, E. P., & Mason, R. Psycho- 
logical and sociological aspects of sur- Received June 2, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol. 49, No. 4, 1958 


OVERLAP AMONG 


CHARACTERISTICS IN GIFTED 


DESIRABLE AND UNDESIRABLE 


CHILDREN 


GORDON LIDDLE 


University 


Terman’s study of the gifted has shown 
that, in general, highly intelligent children 
in addition to being larger and healthier, 
are also somewhat more adjusted socially 
than the average child. His gifted group, 
including those who had been accelerated 
in school, carried these advantages into 
adult life (3). More recently, the Ford 
Foundation’s Fund for the Advancement 
of Education found that a group of gifted 
children coming to college two years early 
had adjusted as well socially and emo- 
tionally to college life as had their class- 
mates (2). 

At present there is considerable dis- 
cussion in educational circles of the ad- 
vantages and disadvantages of accelera- 
tion and special grouping as administrative 
tools in meeting the needs of gifted chil- 
dren. Many school administrators look 
with disfavor on these techniques. When 
asked why, they usually give a reply 
which implies that the social adjustment 
of gifted children is rather fragile. Is 
this fear justified? Are gifted children 
more often or less often subject to se- 
vere maladjustment than other children? 

The purpose of this study is to examine 
the overlapping of talents and maladjust- 
ments in a group of 1015 public school 
children in late childhood and early ado- 
lescence. The research is part of a 10- 
year action-research project being carried 
out by the Committee on Human De- 
velopment of the University of Chicago. 


PROCEDURES 


The population of the study comprised 
the entire public school population of the 
fourth and sixth grades in a Midwestern 
City of 45,000 in the school year 1951-52, 
the first year of the study. For each child 


of Chicago 


included in the population, the following 
characteristics were measured: aggressive 
maladjustment, withdrawn maladjustment, 
social leadership ability, artistic talent, 
and intellectual ability. Tests designed to 
measure all these characteristics were ad- 
ministered during the first year of the 
study. The tests measuring the first three 
characteristics were readministered dur- 
ing the second and fourth years of the 
study. Children for whom test informa- 
tion was incomplete were excluded from 
this study. 

Two tests were used in determining 
aggressive maladjustment, withdrawn mal- 
adjustment, and social leadership ability. 
One is the “Who Are They?” (W.A.T.), 
a sociometric instrument based on chil- 
dren’s evaluations of their peers with re- 
spect to these three behavioral character- 
istics (1). A child’s leadership score was 
determined in response to questions such 
as, “Who are the leaders, the leaders in 
several things?” “Of the people you run 
around with, who are the ones who come 
up with good ideas of interesting things 
to do?” Aggressiveness was determined by 
nominations to questions such as, “Who 
are the boys and girls that seem to be 
against everything that is suggested—the 
gripers?” “Who are the bullies, the boys 
and girls who try to push others around?” 
The following questions are typical of those 
contributing to the withdrawn score, “Who 
are the ones that are too shy to make 
friends easily? It is hard to get to know 
them.” “Who are the boys and girls who 
usually come and go alone and stay by 
themselves most of the time, even though 
they aren’t trouble makers?” : 

The other instrument used to measure 
aggressiveness, withdrawnness, and leader- 
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ship was the “Behavior Description Chart” 
(B.D.C.), a forced-choice teacher rating 
instrument. Here teachers had to pick 
the items “most like” and “least like” a 
given child in a series of 10 groups of five 
statements each, such as the following: 

A. Other people find it hard to get 
along with him. 

B. Is easily confused. 

C. Other people are eager to be near 
him or on his side. 

D. Is usually willing to go along with 
the group. 

E. Interested in other people’s opinion 
and activities. 

In the foregoing pentad, if A was thought 
to be the statement “most like” this 
child, this contributed to his aggressive 
score. If B was thought to be most typical, 
this contributed to his withdrawn score. 
Item C is a leadership item, and D and E 
are not scored since they are presumed to 
be typical of average children. Similarly 
a “least like” nomination for A, B, or C 
subtracted from the child’s score on that 
variable. 

Each individual was given a Percentile 


score for aggressiveness, withdrawnness, 


and social leadership ability on each of 
the two tests administered in each of the 


three years. Because high scores for one 
year might be unduly affected by a tem- 
porary upset in the child’s life or an 


atypical relationship with one of his teach- 


TABLE 1 


CORRELATIONS oF PERCENTILE Scores 
From Two Yuar’s Tests 


Tests 
CON 
Characteristics F 
Who A Behavior 
They? Description 
Aggressive malad- -40 -54 
justment 
Withdrawn malad- 47 -63 
justment 
Social leadership -74 -63 
ability 
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ers, it was thought best to add the six 
percentile scores obtained from the two 
tests and divide by six to get a local 
mean percentile score. This was done for 
each of the three behavioral character- 
istics. 

The scores from the first two years of 
testing have been utilized for a reliability 
study. Product-moment correlations be- 
tween the two sets of percentile scores are 
reported in Table 1. 

Tt should be remembered that this is a 
severe reliability measure, since in the 
second year the children were in different 
classrooms, with different teachers, and 
with from 25 to 60 per cent turnover in 
classroom membership. 

It was noticed that the children who 
ranked high in any one category generally 
ranked in the same category on subsequent 
tests, but that considerable shifting occurs 
in the relative positions of the low-rank- 
ing children. Measured leadership ability 
remained more constant from one year to 
the next than did the maladjustment 
characteristics. 

For all three characteristics, the top 
-10 per cent of the children received half 
of all the nominations on the W.A.T. 
Thus, this instrument differentiates quite 
clearly among those children displaying 
each characteristic to a high degree, but 
does not differentiate among those who 
seldom display the characteristic being 
measured. The B.D.C, yielded a rather 
similar distribution of Scores, 

Intellectual talent was determined 
through use of both tests of “general” in- 
telligence and tests of such “specific mental 
abilities” as could be measured in children 
of 10 or 12 years of age. Also an effort was 
made to include some tests which were 
thought to be more “culture-fair”; that 
is, tests which did not discriminate against 
the children of lower Socioeconomic status 
groups. 

The following tes 
child: 


ts were used for each 
the Science Research Associates 
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Primary Abilities Test (P.M.A.) for ages 
7-11, the Davis-Eells Games, the Goode- 
nough Draw-A-Man Test, the Thurstone 
Concealed Figures Test, and the verbal, 
Spatial, and reasoning subtests of the 
Chicago P.M.A. for ages 11-17. 

The percentile scores on the seven in- 
tellectual measures were averaged. This 
was a rather arbitrary decision, but it 
might be said that the use of a multiple- 
Tegression equation was discarded since 
this method requires an accepted, inde- 
Pendent criterion of talent with which the 
Screening instruments could be correlated. 
Academic achievement test scores or aca- 
demic grades could have been used as cri- 
teria, but there was little reason to suppose 
that they would have been better than the 
test score itself as a criterion. 

: Artistic talent was determined by ask- 
Ing a group of local artists to rate four 
Pictures drawn by each child. These pic- 
tures were: a classroom as seen from the 
doorway, a landscape, a free assignment 


to draw the child’s favorite subject, and 
the,Goodenough Draw-A-Man Test scored 
with different criteria from those used to 
score it as an intelligence test. 

After the testing had been completed, 
the 10% of the total group displaying 
each of the five characteristics to the 
highest degree were set aside, and it is 
these top 10% groups which will be in- 
vestigated in this study. 


RESULTS 


Table 2 points out that children who are 
talented in one area are quite likely to be 
talented in other areas, but are quite un- 
likely to be seen as highly maladjusted. 
Chi square was used in determining the 
statistical significance of the differences 
between observed and expected frequencies 
of overlapping among the five character- 
isties. 

Table 2 shows that: 

1. Social leadership ability is positively 
related to the other talents and negatively 


TABLE 2 
OVERLAPPING OF TALENT AND MALADJUSTMENT CATEGORIES 
(1015 children) 


Leadershi Intellectual Artisti i fs i 
Characteristic Ability Ability Talent pi cee 
(N = 104) (N= 107) (N=102) (N = 103) (N = 101) 

Leadership — 
Intellectual = 

Observed 45.00* 

Expected 10.96 

x? 131.66 
Art ms 

Observed 31.00* 33.00* 

Expected 10.45 10.75 

x? 50.05 57.23 
Withdrawn A 

Observed .00* 3.00* 5.00 

Expected 10.55 10.86 10.35 

x 13.08 7.08 3.42 
Aggressive eN 

Observed .00* 3.00* 7.00 8.00 

Expected 10.35 10.65 10.15 10.25 

x 12.81 6.76 1.21 -61 


* 1% level of confidence 
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related to the two maladjustment charac- 
teristics. There are 76 instances of oyver- 
lapping between social leadership and the 
other talent areas, but no overlapping with 
the two maladjustment areas. 

2. Intellectual talent is significantly re- 
lated to the other talents and is almost as 
surely negatively related to the maladjust- 
ment characteristics. There are 78 in- 
stances of overlapping with the other two 
talent areas, but only 6 instances of over- 
lapping with the two maladjustment char- 
acteristics. Intellectual talent and social 
leadership ability overlapped more than 
four times as often as would be expected 
on the basis of chance occurrence. 

3. Artistic talent is highly related to 
the other talent areas, but while there is 
a negative relationship between artistic 
talent and the maladjustment character- 
isties, this relationship is not statistically 
significant. There are 64 instances of over- 
lapping with one of the talent areas, while 
there are 12 instances of overlapping with 
one of the maladjustment categories. 

4. The overlapping between withdrawn 
and aggressive maladjustment is not sta- 
tistically significant. 

Since there is a possibility that only the 
extremely intellectually gifted have severe 
adjustment problems, it was decided to 
examine the 51 children with the highest 
intellectual scores, the top 5%. Only two 
of the 51 children were in the top 10% 


TABLE 3 
INTERCORRELATIONS oF TALENT AND 
MALADJUSTMENT CATEGORIES 


* Intel- | Leader- ith- J 
Variables J igctuat | “Saget | WIE | Augres 
Intellectual — -49*—.45*| 05 
Leadership 37%] — |—.76*|— 05 
Withdrawn —-28*!—.61*| — |—.29* 
Aggressive =.11 |—.23*|— 944) — 
Note.—Coefficients for girls (N = 143) above diag- 


onal; for boys (N = 130), below diagonal. 
* 1% level of confidence 
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in one of the maladjustment categories, 
while there were 41 instances of over- 
lapping with the talent areas. Thus, the 
top 5% in intellectual talent had more 
overlapping with the other talents and less 
overlapping with the maladjustment cate- 
gories that did the second 5%. 

For readers who are interested in the 
correlations of these characteristics 
throughout their entire range, Table 3 
presents these correlations for 273 of the 
children who were in the sixth grade at 
the beginning of the study. It is believed 
that the correlations for the entire 1015 
children would be quite similar. 

In interpreting Table 3 it must be re- 
membered that the tests used to measure 
leadership and the maladjustment char- 
acteristics were set up to screen out those 
children displaying a given characteristic 
to a high degree and were not intended 
to differentiate between children display- 
ing these characteristics to a lesser degree. 

The table indicates that intellectual 
ability and social leadership ability are 
significantly correlated and that both are 
negatively related to withdrawnness. While 
the negative relationship between with- 
drawnness and aggressiveness is statisti- 
cally significant, it is not extremely high. 
Except for the negative relationship be- 
tween aggressiveness and leadership for 
boys, there are no statistically significant 
relationships between aggressiveness and 
the talent variables. Artistic talent was not 
quantified throughout the entire range and 


thus could not be correlated with the other 
variables. 


Summary 


The top 10% 


groups in intellectual 
talent, 


social leadership ability, artistic 
talent, aggressive maladjustment, and 


withdrawn maladjustment were examined. 


Tt was found that children who were highly 


gifted in one of the three talent areas were 
quite likely to be talented in other areas, 
and quite unlikely to þe seen as highly 
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ITUDE CHANGE THROUGH UNDIRECTED 
aoe GROUP DISCUSSION 


K. M. MILLER AND J. B. BIGGS 
University of Tasmania 


Studies of attitude change have empha- 
sized many variables; the present study 
was concerned with the effectiveness of 
free group discussion about racial groups 
when the discussion groups are socio- 
metrically structured. Some writers, (3, 
8, 11) have stressed the importance of 
using sociometric structure as an aid to 
effective classroom work, suggesting that 
the educational process is more efficient 
when groups are composed of mutually 
attracted members, i.e. when groups are 
cohesive. 


Destan or Srupy 


Two types of groups were selected from 
a class—psychegroups considered high, and 
sociogroups low in cohesion. Attitude 
change was assessed by testing with an 
attitude scale before and after a period of 
undirected discussion about a number of 
racial groups. The stability of any change 
was assessed by a third test some weeks 
later. A control class completed the atti- 
tude scale at the same times but without 
intervening discussion. 

The interval between the first and sec- 
ond tests was designed to minimize memory 
of responses to the first. School vacation 
prevented the interval between the second 
and third tests being identical with the 
first interval. On no occasion were Ss in- 


formed that subsequent tests would be 
given, 


TECHNIQUES 


Sociometric. The conventional form of 
the Moreno technique was used, Ss being 
asked to write the names (up to five for 
each category) of those classmates next 


* Now at National Foundation for Educa- 
tional Research, London 


to whom they would like to sit and would 
not like to sit. 

Social Attitudes. A Bogardus type scale 
was selected as the most suitable both for 
repeated measurement and for showing 
changes after discussion, The form used 
was similar to the Zeligs and Hendrick- 
son (13) modification but had been inde- 
pendently derived by the senior author 
for a previous study. The steps were: 

I would like to have live in my home. 

I would like to have as a close friend. 

I would like to go for a holiday with. 

I would like to have in my sports team. 

I would like to work with in school. 

I would like to have live in my street. 

I would like to have live in my country. 

So the list of racial groups would be 
meaningful for the Ss, 14 were selected 
either on the basis of percentage of na- 
tional group among migrants to Australia 
or for historical reasons, e.g., English and 
Japanese. The groups were: American, 
Chinese, Dutch, English, German, Indian, 
Trish, Italian, Japanese, Jewish, Negro, 
Polish, Russian, Balt. 

Scoring was by the Zeligs method (12) 


whereby each positive response counted 
one point. 


Supsecrs 
Two third year secon 


dary school classes 
of 26 and 16 bo: 


yS respectively were se- 
lected. Of the larger class 24 members 
with a mean age of 177 months, SD 8.5, 
were experimental Ss while all of the 
smaller class with a mean age of 180 
months, SD 9.0 months, were, control 85- 
The difference in age was not significant- 


SELECTION oF Discusston Groups 


The method of analysis of sociomettic ; 
data suggested by Clark and Maguire (2) 
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was used, supplemented in the final choice 
of groups by reference to sociograms. 
Three friendship or psychegroups and 
three neutral or sociogroups each con- 
taining four boys were selected. The 
psychegroups were chosen so that within 
each (i) all members expressed strong 
1, 2, 3) choices for each member of the 
group (ii) each member received at least 
one mutual choice and (iii) no one was 
Tejected by other members of the group. 
The composition of the sociogroups was 
such that no member had expressed either 
Acceptance or rejection of any other 
member. 


PROCEDURE 


The task was presented as part of a 
general study being conducted in several 
countries, to find out how children thought 
about people in their own and other 
countries? To encourage frankness Ss 
Were assured that all replies were confi- 
dential and would not be seen by anyone 
in the school. The sociometric and social 
distance scales were then administered to 
both control and experimental classes. 

_The initial administration of the social 
distance scale was as recommended by 
Bogardus (1), the Æ reading the items 
at three-second intervals. On the later 
Occasions Ss were allowed to complete the 
Scale at their own rate. 

“or weeks later the group discussions 
Ms egi; friendship and neutral groups 
Sean ng alternately. Each group was mM- 
Sion cted only after assembling and at the 
ie of the discussion the members 
ere asked not to discuss the activity 

With the other boys. The Æ introduced the 
cag sien, explaining that he would like 
inh S to say something about a 
ber; er of racial groups. The discussion 
aa lasted approximately 30 minutes, 
minutes being allowed for each of the 

2 

A Teport is in preparation on the Inter- 


Ratio: F RE a 
Seribed js aay, the design of which is de- 
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14 races, though discussion of any one 
group was not abruptly terminated. 

During the discussion, Æ was passive 
and nondirective, averting any questions 
put directly to him. Following the dis- 
cussion the social distance scale was re- 
administered. After half of the group dis- 
cussions had been completed the scale was 
given a second time to the control group. 

As a check on the stability of any change 
the scale was given a third time two weeks 
after the last group discussion to both 
control and experimental classes. 


RESULTS AND ANALYSIS 


Comparison of control and experimental 
groups. A t test of initial scores revealed 
no significant difference (¢ = 1.10, 38 df), 
thus showing that the mean attitude level 
of the two could be considered equivalent. 

The effect of discussion. The mean scores 
for each of the three sections—psyche- 
groups, sociogroups, and control class—on 
first and second administrations were 
tested for differences. The differences for 
the friendly and neutral Ss were signifi- 
cant beyond the one per cent level (t = 
3.43 and 3.16, 11 df respectively) while the 
difference for the controls was not sig- 
nificant, (t = 0.32, 15 df). 

Differences between second and third 
administrations. Inspection of Table 1 


TABLE 1 


Mean Soctat DISTANCE Scores ror 
FRIEND AND NEUTRAL Ss By Group 


Administration 
1 2 3 
Friends 
Group 1 33.25 47.25 42.95 
Group 2 49.25 72.50 57.50 
Group 3 52.00 54.25 55.75 
Total 43.75 58.00 51.83 
Neutrals 
Group 1 39.50 51.25 50.50 
Group 2 37.75 70.00 63.00 
Group 3 51.00 57.75 54.30 
Total 42.75 59.67 56.00 
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TABLE 2 
NuMBER or Remarks or Eacu Tyre 
Mave Durine DISCUSSION o 


Un- 
Payout Four Neu- Total 
able able tral 
Psychegroups 151 133 42 326 
Sociogroups 155 129 39 323 


shows that the mean scores for the third 
administration are lower than immediately 
after the discussion period in the case of 
the friendly and neutral Ss. These differ- 
ences were not significant for any of the 
three sections (t values of 2.14, 1.27, and 
0.86 for friends, neutrals, and controls 
respectively). 

Difjerences between first and third ad- 
ministrations. A similar analysis was made 
of differences between first and third 
sets of scores. The ¢ test analysis showed 
that final scores were significantly greater 
than the initial scores for both psyche- 
group and Sociogroup Ss; at the two per 
cent level for the former and at the five 
per cent level for the latter, Again the 
differences in the control class scores were 
not significant (t = 0.38). 

Quantitative aspects of the discussions. 
The remarks made by each S were re- 
corded and categorized as favorable, un- 
favorable or neutral towards the race 
under discussion. The total remarks are 
shown in Table 2 where it is seen that the 
total number of remarks and the distribu- 
tion according to category are approxi- 
mately equal for the psychegroups and 
Sociogroups. Examination showed that in 
the psychegroups remarks were somewhat 


more evenly spread over all members than 
in the sociogroups. 


Discussion 


The analysis has shown that the mem- 
bers of both psychegroups and Sociogroups 
show more tolerance (decreased social 
distance) after free undirected discussion 
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of some characteristics of different racial 
groups. No such change is shown by con- 
trol Ss tested after the same interval of 
time. The amount of initial change appears 
to be unrelated to type of group as the 
mean difference between friends and neu- 
trals is not significant. s 
When the same scale is applied again 
after an interval of two weeks the mean 
scores of members of both types of ex- 
perimental group are closer to the mean 
scores in the first testing than they were 
on the second. The amount of this re- 
version is, however, not statistically sig- 
nificant, indicating that the favorable 
change engendered by the discussion was 
fairly stable over this short period. Fur- 
ther confirmation was provided by the 
comparison of the initial and final scores 
which were significantly different for both 
psychegroups and sociogroup subjects, at 
two and five per cent levels respectively. 
Few, if any, investigators have sug- 
gested that free undirected discussion 
about racial groups would lead to the 
measurable changes demonstrated in this 
study. Some investigators (6, 9) indicate 
that attempts to change attitudes are more 
successful when Ss are members of natu- 
rally working groups. While a class is in 
Some respects a functioning group it has 
within it a number of groups which are 
more cohesive than the class as a whole. 
Thus it would not have been surprising to 
find the members of the psychegroups 
showing greater and more stable change 
than the Sociogroup members, The results 
of the present study are not in accord with 
such expectations as both friends and 
neutrals show (approximately) equally 
significant changes and stability of change- 
Moreover, they did not differ from each 
other in degree of change throughout.’ 
The present findings are, however, in 
keeping with the Suggestion that close 
“Unrelated ¢ 


between psycheg: 
bers at each sta 


tests on differences scores 
Troup and sociogroup mem- 
ge were nonsignificant. 
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friendships may lead to better communi- 
cation and wider participation in the 
group (6). The evidence for better com- 
munication is provided by notes of the 
actual discussion sessions. The climate in 
the psychegroups was freer and more 
lively and discussion more spontaneous 
than that in the sociogroups where dis- 
cussion tended to be more formal and 
Teserved, though no less frequent, and 
not different in the proportion of favor- 
able, unfavorable and neutral remarks. 
Thus although quantitative differences be- 
tween friend and neutral groups were 
not discovered, there is a difference in the 
way in which the quantitative result was 
achieved. 
_Some investigators (8, 10) have con- 
Sidered such changes in terms of group 
Conformity, suggesting that there would 
be 4 greater tendency for members of 
Sociogroups, being less cohesive than 
Psychegroups, to establish some norm or 
Common position. Others (4, 9) have con- 
Sidered such changes as a function of 
Security—insecurity and personality ad- 
Justment—maladjustment suggesting that 
less secure, less well-adjusted persons may, 
as a means of establishing a more secure 
Sroup relationship, change towards a cen- 
tral position, When the results of the 
Present study were examined for evidence 
of conformity it was found that for all 
three sociogroups and for none of the 
Psychegroups the range of scores after 
Scussion was considerably smaller than 
efore, 
need relevance of these findings for 
cation requires consideration as the re- 
Sults seem to be at variance with those 
aed claimed by proponents of the 
“lometric approach. Work of investi- 
Bators such as Cunningham, Oeser, and 
ag suggests that group discussion 
E uld be more effective in the psyche- 
it ied than in the sociogroups, whereas 
Fie = been shown in this study that meas- 
changes are approximately equal for 


both types of groups. Further research is 
required to ascertain whether the changes 
are pfimarily a function of the learning 
process, in one case, and a function of 
personality factors and insecurity in the 
other. 


SUMMARY AND CONCLUSIONS 


This study was an attempt to relate 
attitude change with free discussion in 
psyche- and sociogroups. Several findings 
are definite while others merit further 
investigation. 

1. Free, undirected discussion about 
racial groups by two types of small groups, 
selected on a sociometrie basis, resulted 
in a significant change of attitude irre- 
spective of the type of group. Further, 
this change was relatively stable over a 
short period. 

2. Contrary to expectations from socio- 
metric studies in the classroom, and from 
studies of group structure, no significant 
differences between the quantitative 
changes of friendly and neutral Ss were 
discovered. 

3. Nevertheless, it was suggested that 
the psychological processes in the two 
types of groups might be different and 
that further investigation is necessary 
to show whether the tendency for the 
scores of members of sociogroups to come 
closer to a central position after discus- 
sion is a function of personality adjust- 
ment. 
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RELATIONSHIP OF SELF-ACCEPTANCE TO OTHER VARIABLES 
WITH SIXTH GRADE CHILDREN ORIENTED IN 
SELF-UNDERSTANDING? 

PAUL BRUCE’ 

Child Welfare Research Station, State University of Iowa 


The purpose of this study was to in- 
vestigate with sixth grade children some 
of the relationships between a measure of 
self-acceptance and other personality var- 
iables. The relationships were studied in 
two groups of Ss: one consisting of pupils 
who had taken part in a learning program 
designed to develop a more understanding 
and analytical approach to their own and 
others’ behavior (the experimental group) 
and one group consisting of pupils who 
had not undertaken such a program (the 
control group). 

A review of the literature—particularly 
the writings of Rogers (15) and other 
researchers with a client-centered therapy 
orientation—suggested that self-accept- 
ance be defined as the congruence between 
the way a person thinks himself to be 
(self-concept) and the way he thinks he 
would most like to be (ideal-self). Ac- 
cordingly, in this study, @ measuring in- 
strument from which Self-Ideal Discrep- 
aney scores were obtained was devised 
Suitable to a sixth grade population. 

„Within the framework of a self-ideal 
discrepancy measure of self-acceptance, & 


*This paper is based on a doctoral dis- 
Sertation submitted to the Graduate College 
of the State University of Iowa in 1957. The 
Writer is indebted to Ralph H. Ojemann 
or his encouragement and guidance through- 
ut the study and to the members of the 


Preventive Psychiatry Project staff for their 


invaluable assistance. 
Now at San Diego State College. 


child’s indication of his ideal-self can be 
viewed as his expression of how he feels 
some of his psychological needs (such as 
the needs for security, personal worth, 
status, ete.) can be best satisfied. In 
other words, underlying the conception 
of ideal-self is the assumption by the child 
that if he becomes more like his ideal- 
self, he will be better able to satisfy some 
of his secondary needs. For example, im- 
plicit in a child’s wanting “to be better 
looking,” “to have more friends,” “to be 
less sensitive,” etc., is the feeling that one’s 
affectional, security, and status needs 
(among others) would be better satisfied 
if these ideals were achieved. If a child, 
then, feels he is quite unlike his ideals or 
feels he is not making progress towards 
his ideals, it might be speculated that ade- 
quate satisfaction of several of his sec- 
ondary needs is being blocked in some 
way, and thus several predictions concern- 
ing his behavior might be made. 

It would be expected that a child who 
indicates a marked discrepancy between 
his self-concept and ideal-self, feeling his 
need satisfaction blocked, would show evi- 
dences of insecurity in his behavior and 
would yield responses indicating manifest 
anxiety. Such relationships between meas- 
ures of self-acceptance and various meas- 
ures of insecurity and anxiety have been 
obtained with subjects of college age and 
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to see if the relationship between self- 
ideal discrepancy (as a measure of self- 
acceptance) and measures of observed 
insecurity and manifest anxiety held for 
a group of sixth grade children. 

The orientation program which the ex- 
perimental group of Ss had undertaken 
was designed to help each pupil develop 
a “causal,” analytical orientation towards 
his social environment. The program was 
based on the assumption that a child who 
is provided with the opportunity for 
understanding some of the many causes 
underlying his own behavior and the be- 
havior of people about him will be able 
to make more effective adjustments. In- 
volved in the program was the special 
training of the teachers and the use of cer- 
tain special curricular materials, the de- 
tails of which have been published else- 
where (9, 11, 12). 

With respect to this investigation, the 
orientation program provided a group of 
Ss who had been trained in self-under- 
standing, among other things. Several 
writers, such as Jersild (6) and Rogers 
(14), have indicated that an important 
characteristic of a self-accepting person 
is self-understanding. The question arises, 
then, as to whether a child with increased 
self-understanding is thereby more self- 
accepting. It might be hypothesized that 
as a child gains insight into his own mo- 
tivations and dynamies—that is, gains in 
self-understanding—he would give evi- 
dence of being more self-accepting. Also 
to the extent that the orientation pro- 
gram helps the individual learn how to 
work out his daily situations more ef- 
fectively, he may be helped to make 
Progress towards his ideal thereby re- 
ducing the self-ideal discrepancies, 

On the other hand, increased self- 
understanding may not affect self-accept- 
ance, particularly when self-acceptance is 
defined in terms of self-ideal discrepancy, 
For example, an individual may see him- 
self some distance from his ideal, but with 
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increased self-understanding, understand 
some of the reasons for this situation. In 
such a case, self-ideal discrepancy might 
remain high even after the orientation 
program, although the individual would 
feel better—that is, be less anxious—about 
it. Thus, the orientation program might 
serve to reduce anxieties which normally 
accompany high self-ideal discrepancy 
without necessarily reducing the discrep- 
aney itself. It appears, therefore, that the 
orientation program could have diverse 
effects upon a self-ideal discrepancy meas- 
ure of self-acceptance. 

The specific questions to which answers 
were sought in this study are: (a) Is there 
a significant difference between Ss with 


high self-ideal discrepancy and those with 
low self-ideal discrepancy in average scores ` 


on a test of manifest anxiety? (b) Does 
this relationship hold for Ss who have 
taken part in an orientation program in 
self-understanding? (c) Is there a signifi- 
cant difference between Ss with high self- 
ideal discrepancy and those with low self- 
ideal discrepancy in average ratings on an 
observation scale of insecurity? 


PROCEDURE 


Measures used in study. A Self-Accept- 
ance Scale was devised and initially ad- 
ministered to several sixth grade classes. 
The results and items were analyzed quali- 
tatively and quantitatively. The 10 items 
finally selected for the revised Self-Ac- 
ceptance Scale, in general, reflected af- 
fective characteristics about which indi- 
viduals in our culture were thought to have 
substantial feelings, (Only nine of these 
items were scored since one proved am- 
biguous in subsequent administrations of 
the scale.) A more detailed descriptio? 
and a copy of this scale appears elsewhere 
3 


Examples of typical 
“1. This is someone who feels that other 
don’t like him or her—someone whom 
nobody seems to care about;” and m 


items are these: 
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This person is happy and cheerful most 
of the time—one who seems to enjoy what 
he or she does.” Some of the items in the 
scale were adapted from the Social Analy- 
sis of the Classroom Inventory developed 
by the Horace Mann-Lincoln Institute of 
School Experimentation (5). Descriptive 
statements were utilized in place of the 
trait-names (used on most other scales 
of this type) in order to reduce the amount 
of ambiguity. 

4 The Ss were asked to answer four ques- 
tions about each of the descriptive state- 
ments: 


1. Would most boys and girls my age like 
to be like this person? (This question was 
not scored but was included as a check on 
the general favorability or unfavorability of 


` each item.) 


2. Am I like this person? (Self-Concept) 

3. Am I becoming more like this person? 
(The results obtained by the use of this 
question were inconclusive and are not dis- 
cussed in this paper.) 

4. Would I like to be like this person? 
(Ideal-Self) 


The S answered each question indicating 
to what extent he felt it was true by 
checking along a 5-point scale reading: 
very much so,” “quite a bit,” “some- 
What,” “not very much,” and “not at all.” 

The particular score yielded by this 
Scale which was used in the major analyses 
Teported in this paper is the Self-Ideal 

iserepancy score. This score is the dis- 
crepancy between the rating in answer 
to Question 2 (Self-Concept) and the 
Tating in answer to Question 4 (Ideal- 
Self) summed for the nine items. Test- 
Tetest reliability for this score is reported 
M the next section. 

_ The other measures used in this study 
included the Children’s Manifest Anxiety 
Seale developed by Castaneda et al. (4) 
and the Kooker Security-Insecurity Rating 
Scale. The Children’s Manifest Anxiety 
cale consists of 42 anxiety items which 
are answered “yes” or “no.” The Kooker 
Seale (8) gives evidence of children’s be- 
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havior which is independent of their re- 
sponses on a paper-pencil type test and 
which reflects their “typical” behavior pat- 
terns. This scale requires a trained observer 
who is familiar with the child’s behavior to 
rate him on a series of 19 behavior items, 
as occurring “frequently,” “fairly often,” 
or “seldom.” In the present study, each 
classroom was visited for a five-day period 
and the children rated by one of the two 
trained observers (who remained ignorant 
of the nature of this investigation). Both 
observers practiced rating the same Ss 
with the Kooker scale until they were in 
substantial agreement with each other. 
With the exception of the Kooker scale, 
the other measures were administered to 
all classes by the investigator. The teachers 
were not involved in the measurement as- 
pects of the investigtion and were not 
aware of the nature of the study. 
Subjects of investigation. The subjects 
in this study were pupils of eight sixth 
grade classes in different elementary 
schools located in comparable neighbor- 
hoods with respect to socioeconomic status. 
The four experimental classes had under- 
gone the orientation program described 
above. Most of the pupils in two of these 
experimental classes had been in the pro- 
gram for two consecutive years; whereas 
the pupils in the remaining two experi- 
mental classes had undergone the orienta- 
tion program for the current year only. 
At the beginning of the program, four 
control classes were selected controlling 
for some of the teacher variability. Thus, 
for every teacher of an experimental class, 
a control teacher was selected and matched 
on the basis of several variables including 
age, sex, number of years teaching experi- 
ence, and educational level. Unlike the 
experimental teachers, the control teach- 
ers did not have special training for this 
program and did not carry on the planned 
learning program in their classes. 
With respect to the Ss themselves, in 
the experimental classes 50 boys and 48 
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girls completed all of the testing involved 
in this study, while in the control classes 
tests were completed by 40 boys and 46 
girls. The average IQ for the experimental 
classes was 105.9 and 105.4 for the con- 
trol classes as measured by the Otis Self- 
Administering Test of Mental Ability. As 
far as could be determined from school 
officials, no systematic method was used 
to assign the pupils to particular classes 
or to teachers (except for the two-year 
experimental group who were kept to- 
gether the second year specifically for the 
orientation program.) Thus, with the Ss 
coming from comparable home back- 
grounds, being randomly placed in their 
particular class groups, and being undif- 
ferentiated with respect to average in- 
telligence scores, the pupils in all eight 
classes were assumed to be originally a 
random sampling of a population of such 
pupils in their particular schools, 
Statistical analyses. To test two major 
problems of this investigation—that of 
establishing significant relationships be- 
tween self-ideal discrepancy and measures 
of manifest anxiety and obseryed insecur- 
ity—a three-dimensional (2x2»x 2) 
analysis of variance design was used ( 10). 
The factors controlled in this design were 
sex, the experimental-control condition, 
and self-acceptance. The Ss were divided 
into two groups according to their scores 
on the Self-Acceptance Scale—one group 
(the Highs) made up of those whose Self- 


TABLE 1 
Tust-Rernsr (One Werx INTERVAL) 
RELIABILITY CORRELATION CoErFFI- 


CIENTS OF THE SEVERAL SCORES OF THE 
SELr-AccEPTANCE ScALE FoR EACH or 
THE Two CLASSES STUDIED 


Sat: | Taea | Set 

elf- A 

N Concept Self Tis 
crepancy 

Class I 21 -83 -69 -80 

Class II 26 -93 -54 -86 
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Ideal Discrepancy scores fell above the 
median; the other group (the Lows) with 
those whose Discrepancy scores fell be- 
low the median. This median was com- 
puted using all 184 of the main group of 
Ss. In order to have equal frequencies in 
each cell of this design, 20 Ss were ran- 
domly selected from each group which 
represented a different combination of the 
factors being controlled; thus, there were 
160 Ss used in the analysis of variance 
tests. The dependent variables in the 
analyses of variance were the scores from 
the Children’s Manifest Anxiety Scale 
and the Kooker Security-Insecurity Rat- 
ing Scale. 

The possible effects of the orientation 
Program on the relationship between self- 
ideal discrepancy and anxiety was tested 
by t tests of a difference in means between 
the experimental and control groups, 


Resuurs 


Results of preliminary testing. Test-re- 
test reliability of the Self-Acceptance 
Seale was determined by administering this 
test to two classes of sixth graders on two 
occasions, one week apart. Coefficients of 
correlation for the two administrations 


for the various sections of the scale are 
shown in Table 1. 


Although all of 


these coefficients are 
significant be 


yond the one per cent level 
of confidence, it will be noted that those 
for the Ideal-Self scores 


lower than the others. T} 
tributed to the limited ra; 
tic of this particular seo 
pupils tended not to var 
other in the rating of the: 
cepts—thus, the correl, 
ticularly Sensitive ey, 
deviations from one a 
next. 


In order to test whether or not intelli- 
gence was likely to be an important facto? 
in the self-acceptance score, IQ scores 
from the Otis Self-Administering Test of 


are somewhat 
his can be at- 
nge characteris- 
re—that is, the 
y from one an- 
ir ideal-self con- 
ation test was par- 
en to the smallest 
dministration to the 


SELF-ACCEPTANCE AND SELF-UNDERSTANDING 


Mental Ability were secured from school 
records for all pupils who participated in 
the testing program. The coefficient of 
correlation between the Self-Ideal Dis- 
crepancy scores and IQ scores for the 47 
pupils participating in the preliminary 
study was —.08 indicating intelligence was 
not a factor. 

Analyses using Self-Acceptance Scale. 
Tables 2 and 3 show the results of analy- 
Sis of variance tests using mean scores 
from the Children’s Manifest Anxiety 
Scale and the Kooker Security-Insecurity 
Rating Scale, respectively, when the Ss 
Were divided into two groups, Highs and 
Lows, according to whether they fell 
above or below the median Self-Ideal Dis- 
crepancy score. These tables show that 
the differences in the means on both the 
anxiety and insecurity scales between the 
Highs and Lows were statistically signifi- 
cant beyond the .01 level of confidence. 

Urthermore, these differences were in the 
direction such that the group with the 
relatively high self-acceptance (that is, 
low discrepancy scores) yielded average 
Scores indicating less manifest anxiety, as 
Measured, and less insecurity, as rated by 
observers, 

One qualification to these findings should 
ə% noted. In the analysis of variance tests, 
the interaction effects were not significant 
àt the .05 level of confidence (the level 
Prescribed prior to the investigation) ; 
Fh there was a tendency for an in- 
eraction to exist between the Self-Ideal 

Iscrepaney scores and the experimental- 
rca condition. (The F test of this in- 
Taction between the discrepancy scores 
and the experimental-control condition 
Produced ratios of 3.14 and 3.29 for the 
wae and Kooker scores, respectively, 

Ich are significant between the .10 and 

levels of confidence.) This tendency 
an interaction indicates that the dis- 
at measure of self-acceptance was 
nee ed to the personality variables of 

Xlety and insecurity differentially ac- 
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TABLE 2 


MEANS AND ANALYSIS OF VARIANCE OF 
SCORES ON THE CHILDREN’S MANIFEST 
Anxisty SCALE or SUBJECTS DIVIDED 
INTO “Higus” AND “Lows’’ ACCORDING 

TO THEIR SELF-IDEAL DISCREPANCY 
SCORES ON THE SELF- 
ACCEPTANCE SCALE 
(N = 20 1n Eacu Sex, Crass- 
TYPE GROUPING) 


Children’s Manifest 
Anxiety Scale 
Self-Ideal Discrepancy i m 

Boys | Girls Highs 

Lows 

Highs 16.78 
Exptl. Classes |12.10/18.55 
Control Classes |17 .05/19.40 

— 13.40** 

Lows 12.56 
Exptl. Classes |13.05|13.25 
Control Classes |11.70/12.25) 
Total (Boys /13.48)15.86 

& Girls) 
F 4.30* 


* Significant beyond the .05 level of confidence. 
** Significant beyond the .01 level of confidence. 


cording to whether or not the Ss were in 
the orientation program. The implications 
of this possible interaction will be dis- 
cussed below. 

The results reported thus far have indi- 
cated relationship between self-accept- 
ance, as measured, and the measures of 
manifest anxiety and observed insecurity. 
So as to obtain further information con- 
cerning the extent of this relationship, a 
correlational analysis of the major vari- 
ables was made, and the results are re- 
ported in Table 4. The coefficients of cor- 
relation for Self-Ideal Discrepancy scores 
with the Anxiety scores and Kooker scores 
were each significant beyond the .01 level 
of confidence. 

Effects of the orientation program. Anal- 
ysis of the data indicating the effects of the 
orientation program on the variables meas- 
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TABLE 3 


MEANS AND ANALYSIS OF VARIANCE OF 
ScorEs on THE KooKEer Srcurrry- 
Insecurity ScaLe or Supsecrs’ Di- 

VIDED INTO “Higgs” anp “Lows” 
ACCORDING TO THEIR SELF-ĪDEAL 
DISCREPANCY Scores ON THE 
SELF-ACCEPTANCE SCALE 
(N = 20 1n Eacu Sex, Crass- 
TYPE GROUPING) 


Kooker Security- 
Insecurity Scale 
Selisideal Discrepancy Total F 
Boys | Girls | Highs 
Lows 
Highs 25.10 
Exptl. Classes [24.90|25.65 
Control Classes [25.2024 64 
13.42* 
Lows 22.81 
Exptl. Classes {22.19/91 -60 
Control Classes |2475 22.80 
Total (Boys & |24,94 23.68 
Girls) 
F <1 


* Significant beyond the 01 level of confidence 


TABLE 4 
COEFFICIENTS or Cor 
THE SELF-IDEAL Disc: 
THE SELF-ACCEPTANCE SCALE AND THE 
MEASURES or ÅNXIETY AND 
InsECURITY 


RELATION BETWEEN 
REPANCY SCORE or 


Self-Ideal Discrepancy 


B. F 
Flys, | om 
Manifest Anxiety -28* | .41* | 35% 
Scale 
Kooker Rating Scale -83* | .34* +32* 


* Significant beyond the .01 level of confidence 


ured appears in Table 5, Separation of the 
experimental classes made possible a com- 
parison of the effects between exposure to 
the program for one year and for two 
years. It will be noted that significant dif- 
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ferences on the measures of manifest anx- 
iety and observed insecurity appear be- 
tween the classes having had the program 
for two years and each of the other two 
groups involved—those haying had the 
program for one year and those in the con- 
trol classes. These differences are in a di- 
rection which indicates that at least two 
years’ exposure to the orientation pro- 
gram may serve to lower manifest anxiety 
and reduce observed insecurity. Signifi- 
cant differences between the two-year ex- 
perimental group and the one-year experi- 
mental and control groups were not found 
with the Self-Ideal Discrepancy scores. 

An interesting question is, how do the 
Highs and the Lows with respect to Self- 
Ideal Discrepancy in the two-year experi- 
mental group compare with those in the 
control group? This analysis is given in 
Table 6. 

Inspection of this table shows that there 
is a difference in means between the High 
two-year experimental group and the con- 
trol group in manifest anxiety which is 
significant beyond the 01 level of confi- 
dence. In other words, it appears that 
those in the two-year experimental group 
who retained a relatively high Self-Ideal 
Discrepancy appeared to indicate less 
manifest anxiety’ (as measured) than did 
the control Ss who also had a relatively 
high Self-Ideal Discrepancy. This lends 
Some support to the contention that at 
least a two year exposure to the orientation 
Program may allow individuals to feel 
more comfortable about discrepancies 
which they fee] exist between their self- 
concepts and ideal-self concepts. 

Inspection of Table 6 also reveals that 
the difference between the Highs and Lows 
in manifest anxiety in this two-year eX 
perimental group is not significant. I 
other words, it appears that the major Te 
Sults concerning the Telationship betwee? 
Self-Ideal Discrepancy scores and mani- 
fest anxiety should be qualified. High Self- 
Ideal Discrepancy Scores may be associate 
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TABLE 5 . 
MEANS AND ¢ TESTS BETWEEN ONE AND Two YEAR EXPERIMENTAL GROUPS AND 
BETWEEN THE Two YEAR EXPERIMENTAL GROUP AND CONTROL GROUP on MEAS- 
URES OF SELF-ACCEPTANCE, ANXIE‘‘Y, AND INSECURITY 


Self-Ideal Discrepancy) Manifest Anxiety Scale 


Kooker Rating Scale 


Score 
N 
xX t x 1 XY t 

One-Year Exptl. Group 53 | 7.98 15.49 24.03 

0.87 2.38* 2.24* 
Two-Year Exptl. Group 45| 7.16 11.93 22.42 

0.82 2.30* 2,.24* 
Control Group 86 | 7.88 14.97 24.12 

* Significant beyond the .05 level of confidence. 
TABLE 6 DISCUSSION 


Means AnD t TESTS ON THE CHILDREN’S 
Maniresr Anxiety SCALE BETWEEN 
THE Two YEAR EXPERIMENTAL AND 
CONTROL GROUPS AND BETWEEN THE 
Hiaus AnD Lows IN SELF-ĪDEAL 
DISCREPANCY SCORES 


Children’s Manifest Anxiety Scale 
Raker eer a 
Dif- 
Highs Lows fer- t 
ence 
Two-Year 11.88 | 12.04 |0.16| n.s. 
Exptl. (N = 16|(N = 26) 
Group 
Control 18.22 | 11.98 |6.24|4.00* 
Group (N = 40)|(N = 40) 
Difference 6.34 0.06 
-_— 
t 2.92* n.s. 


* Significant beyond the .01 level of confidence. 


with high Manifest Anxiety scores only 
tor Ss with insufficient self-understanding ; 
this relationship may not apply to Ss ex- 
Deriencing an orientation program in self- 
Understanding for at least two years. This 
finding also may explain the tendency for 
interaction effects to exist between the 
Self-Tdeal Discrepancy scores and the ex- 
berimental-control condition, particularly 
m the case of the Anxiety test analysis 
Which was noted above. 


The validity of the Self-Acceptance 
Scale. The results reported above indicated 
significant relationships between the Self- 
Ideal Discrepancy measure of self-accept- 
ance and scores from the Children’s Mani- 
fest Anxiety Scale and the Kooker Se- 
curity-Insecurity Rating Scale. These find- 
ings corroborate with sixth-grade-children 
findings of various other investigators who 
used similar measures with older Ss. The 
validity of the Self-Acceptance Scale is 
further substantiated by the fact of its 
relationship to an observation measure 
(the Kooker scale) which was independent 
of the children’s paper-pencil responses. 

The relatively low correlation between 
the Kooker and the Anxiety scales (V = 
184, r = .26) may indicate that these two 
instruments are measuring different vari- 
ables, and the Self-Acceptance Scale may 
be tapping some of the variables distinc- 
tive to each of these other two scales as is 
indicated by the following coefficients of 
correlation (N = 184): (a) Correlation 
between Self-Ideal Discrepancy scores and 
Anxiety scores = .35; (b) Correlation be- 
tween Self-Ideal Discrepancy scores and 
Kooker scores = .32. 

The results of this investigation (par- 
ticularly the analyses of the effects of the 
orientation program to be discussed in 
more detail below) raise a serious question 
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as to the adequacy of the self-ideal con- 
gruence concept of self-acceptance. A dis- 
crepancy between self and ideal might well 
mean different things to different individ- 
uals. While to one person a self-ideal dis- 
crepancy might be a threat to his self sys- 
tem, to another such a discrepancy might 
indicate that his aspirations are high and 
serve as a challenge to him. What seems 
to be important is not the discrepancy it- 
self, but the feelings about it. Thus, as the 
discussion below will indicate, it isn’t nec- 
essarily the high discrepancy alone which 
is associated with anxiety but a high dis- 
crepancy under certain conditions. 

The effects of the orientation program. 
In the introductory discussion, it was 
pointed out on an a priori basis that the 
orientation program might be expected 
to have diverse effects on the behavior of 
children. Among the early effects of such 
a program may be the reduction of the 
anxiety which seems to accompany a 
high self-ideal discrepancy. Bearing this 
out was the tendency for interaction ef- 
fects to exist, particularly in the analyses 
of the Anxiety scores, between the Dis- 
crepancy scores and the experimental-con- 
trol condition. This indicates that the re- 
lationship between the Discrepancy scores 
and the measure of anxiety was not the 
same for the experimental classes as for 
the control classes. 

Further evidence indicating the nature 
of this tendency for interaction is obtained 
when analysis is made of the scores of 
those classes in which most of the pupils 
had had the orientation program for a 
two-year period. It will be recalled from 
Table 6 that those with high Self-Ideal 
Discrepancy scores in the two-year ex- 
perimental group had average Anxiety 
scores which were significantly lower than 
those in the control group. Thus support 
is given to the contention that at least a 
two year exposure to an orientation in 
self-understanding may allow individuals 
to feel more comfortable about discrepan- 
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cies which they feel exist between their 
self-concepts and ideal-self concepts. 

Indication that a period longer than one 
year in the orientation program may be 
needed for measurable changes to occur 
in such variables as manifest anxiety and 
observed insecurity was seen in the re- 
sults reported in Table 5. It will be re- 
called that in this analysis, the two-year 
experimental group had average scores 
which indicated less manifest anxiety and 
less observed insecurity when comparisons 
were made with the one-year experimental 
group and the control group. Even though 
the differences of Self-Ideal Discrepancy 
scores between these groups were not sta- 
tistically significant, the discussion above 
indicated that the feelings about a high 
discrepancy, where it existed, may have 
been changed as a result of the two-year 
orientation program. 

Obviously, a single study is not sufficient 
to establish the validity of this finding. 
Several studies will be needed to check 
these results, but those from this investiga- 
tion—that is, the tendency toward inter- 
action effects noted in the analyses of the 
Self-Acceptance Scale and the study of the 
two-year experimental group—suggest that 
anxieties relative to high self-ideal dis- 
crepancies may be more prevalent for 
those with insufficient self-understanding 
than for those who have been trained in 
understanding themselves. 


SUMMARY 


The purpose of this investigation was to 
construct a measuring instrument which 
could be reliably used in the investigatio” 
of purported relationships between self- 
acceptance and other important personal- 
ity variables and in the study of the poss 
ble effects a learning program concerne 
With the causes of behavior might have 0” 
the participants in such a program. 

The Self-Acceptance Scale, constructed 
to measure self-acceptance in sixth grade 
children, consisted of a series of descrP” 
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tive statements with which each child 
rated himself according to the extent he 
felt he was like the description (the self- 
concept) and to what extent he felt he 
wanted to be like the description (ideal- 
self). Self-acceptance was defined as the 
congruence (that is, relative lack of dis- 
crepancy) between self-concept and ideal- 
self, 

The Ss were 184 pupils in eight sixth 
grade classes in different elementary 
schools of a medium-sized city in Iowa. 
Four of the eight classes had undergone 
a planned learning program designed to 
help each pupil acquire an understanding 
of the dynamic, variable, and complex na- 
ture of human motivation. 

The other measures used in this investi- 
gation included the Children’s Manifest 
Anxiety Scale and the Kooker Security- 
Insecurity Scale (ratings made by trained 
observers in the classrooms). 

_The results indicated a statistically sig- 
Nificant relationship (beyond the .05 level) 
between self-acceptance, as measured by 
the Discrepancy scores, and the measures 
of manifest anxiety and observed insecur- 
ity such that those with the smaller Dis- 
crepancy scores had average scores indi- 
cating less anxiety and less insecurity. 

Owever, a tendency was noted for inter- 
Action effects to be present indicating that 
this relationship between self-acceptance 
and measures of anxiety and insecurity 
might have been operating differentially 

etween the experimental and control 
classes, 

When analyses were made of those ex- 
Perimental classes in which most of the 
Pupils had had the program for two con- 
Secutive years, support was found for the 
Contention that although Self-Ideal Dis- 
crepancy scores remained high for some 
of these pupils, the Anxiety scores were 
Significantly lower than those of either the 
One-year experimental or the control 
Classes, This finding suggested that the 
Statement of relationship between self- 
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ideal discrepancy and anxiety might have 
to be qualified to apply to Ss with insuffi- 
cient self-understanding. Also, the two- 
year experimental group obtained average 
scores which indicated less manifest anxi- 
ety and less observed insecurity than did 
the average scores obtained by either the 
one-year experimental group or control 
group. 
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_An organism’s adjustment to a new 
situation may be scrutinized for the vari- 
ability or fixedness of the behavior em- 
ployed. Presumably, during the early 
Stages of solving a novel problem, be- 
havior is characterized by its variability ; 
aS commerce with the problem increases 
and correct solutions are achieved, the 
Tesulting habits act to decrease subse- 
quent variability. In a recent statement, 
Scott (12, p. 61) points out that such 
a sequence is characteristic even of the 
Simplest. organism: 


Such variability can be seen even in the 
Ower organisms. If a paramecium runs into 
ri obstacle it does not keep repeating its 
big ts It backs off and approaches from 
T ifferent angle, and never does. exactly 

e same thing again. It is apparent that 
variability of this type is necessary for the 
Process of adjustment, since an animal 
Which gave fixed invariable responses could 
never adapt itself to a variety of changing 
Conditions, 


The problem solving model presented 
by the Paramecium might well be envied 
by humans. There is ample experimental 
evidence that under some circumstances 
human problem solving is characterized 

Y fixedness rather than variability of 
Tesponse. If the demands of the problem 


` bonis study is part of an Ed.D. thesis 

T mitted to the Graduate School of Ed- 

aeation, Harvard University, by the first 
uthor, The research was performed when 
Oth authors were members of the Harvard 
acher Education Research Project, sup- 
Ported by grants from the Fund for the 
dvancement of Education. 


situation are similar to previously ac- 
quired solution methods, the transfer ef- 
fects to novel situations should, of course, 
be positive. Where the demands of the 
problem are only superficially similar to 
previous successful solutions the present 
application of these methods is often 
doomed to failure. The next adaptive step 
would be to abandon the first method, 
or hypothesis, and to try another, in 
contrast to the persistent application of 
the first. 

Fixedness in problem solving has been 
variously attributed to immediately prior 
problem solving experiences, to the na- 
ture of the problem at hand, and to per- 
sonality dispositions of the solver (1, 4, 
5, 8). 

One major cause of lack of variability 
is problem solving set. Of course, should 
the person’s set be appropriate for the 
problem the solution is likely to be fa- 
cilitated. Under the not infrequent con- 
ditions where his set is inappropriate, the 
problem solver has a dual task before 
him, ridding himself of the maladaptive 
set and then applying the new solution, 
Recognizing Gibson’s (3) indictment 
against the chaotic usage of the concept 
of set, we will define it for our purposes 
as “that manner of attacking a problem 
which is carried over from a previous to 
a succeeding problem situation.” 

Set precludes the variability of be- 
havior which is essential if the habituated 
tack is to be cast off. This study is con- 
eerned with a way of training children 
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in problem solving so that in the face 
of new problem situations they will either 
not develop sets or will give them up 
when they do not work. 

What the problem solver needs is a 
nonspecific approach, (sometimes called 
also “mode of attack” or “plan of ac- 
tion”), to many types of problems. One 
important aspect of this general approach 
is the assumption that problems often 
lend themselves to various solutions, so 
that if one solution does not work there 
is still the possibility that the problem 
may be solved. The solver might profit- 
ably attack the problem again, with a 
new orientation. How may such general 
problem solving behavior be inculcated? 
Maier (7) gave Ss specific training in 
reasoning and Luchins (6) prior to his 
einstellung situation instructed his Ss, 
“Don’t be blind.” Both investigators re- 
port that these devices aided Ss in over- 
coming habituated ways of solving prob- 
lems. 

This study investigates another method 
of inducing variability in problem solv- 
ing. One group of Ss is taught two 
solutions to the same set of problems, 
another group one solution to these prob- 
lems. Then the two groups are observed on 
a set of similar problems necessitating new 
solution methods. Next, a set is induced 
on a novel series of problems and the 
responses to situations where this set no 
longer works are observed. Our expecta- 
tions are that the people trained in al- 
ternative solutions will solve more test 
problems correctly, will exhibit greater 
variability to these problems, and will 
persevere longer on problems too difficult 
for them to solve. 


MerrtHop 


The Ss were 48 sixth grade children, 
25 boys and 23 girls, who lived in a pre- 
dominantly middle class, white collar 
community, 

Ss were divided into two groups of 
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24 children each, with the sexes repre- — 
sented as equally as possible in each 
group. The two groups did not differ 
either in intelligence or school achieve- 
ment. E took each child individually from h 
his classroom to a nearby experimental — 
room. # told the child that he was in- 
terested in “how children solved prob- 
lems.” The procedure consisted mainly 
of training and test series on two separate — 
types of paper and pencil problems. . 
Problem 1. The water jar problems. 
These problems are an adaptation of the 
water jar problems described by Luchins 
(5). S is required to obtain a specified 
amount of water using only empty jars | 
and their total capacities as measures. 
The jars were drawn on cards, with a 
number in each jar denoting its capacity 
and a number in the margin indicating 
the amount of water to be obtained. 2 
presented a single problem on a card, 
and S solved the problem on a work 
sheet. y 
The training series consisted of 10 
problems, each solvable by the formulas, 
B — A — C, or B — 2A + C (the letters 
refer to the capacities of the three jars). 


E taught S to solve the problems by . 
one or both of these solution methods 
depending on the experimental design be- 
low. 

Following the training problems and 4 
repetition of the general instructions, A 
was presented six test problems. Fac? 
of the first five called for a novel solu- 
tion method; the sixth was solvable by | 
the method taught during the train” 
series. E 

S was permitted to work on each wa | 
problem until he solved it or until ^ 
decided to stop. The time spent on 68%" — 
problem was recorded. 3 

Problem 2: The puzzle problems. A ; 
mediately following the completion of : | 
above training and test series the SS wer ý 
given 13 puzzle problems. These are P2P° 
and pencil versions of jigsaw puzzles- 
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was given a booklet on each page of 
which was a picture of a whole figure. 
Beneath the figure were pictured num- 
bered fragments of various sizes and 
shapes. The problems call for S to choose 
those fragments which, when put to- 
gether, will form the whole figure. Ss 
indicated their solutions at the bottom 
of the page by marking the numbers of 
the pieces they chose, in the order in 
which they used them. 

The first eight of the 13 puzzles were 
all solvable by assembling the fragments 
numbered 1, 3, 5, and 7. Each of the 
five final problems required a different 
solution from the initial set-inducing se- 
Ties and from each other. Superficially, 
the test problems looked as though they 
were solvable by the original method. To 
the S, there was no break between the 
training and the test problems. 

No time limit was set on these prob- 
lems; a record of the time spent on each 
Problem was kept. 

Experimental manipulations. The two 
groups of Ss differed only in the treat- 
ment they received during the training 
Series of 10 water jar problems. One 
group was taught the two alternative 


_ Solutions to these problems (labelled A 


group), Æ worked the first problem by 
One and then the other solution method 
and half of the succeeding problems were 
Worked out by one or the other method. 

he no alternatives group (hereafter la- 
belled NA) were taught to solve the 10 
Problems by only one solution method. 


RESULTS 


Stated generally, our hypotheses are 
(a) that experience in solving problems 
m more than one way should dispose the 
S to relinquish more quickly a maladap- 
tive set and consequently yield more cor- 
qet Solutions to new problems, and (b) 
nat this variability of response should 
Seneralize to problems different from 
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those on which the dual solutions were 
taught. 

Consider first the six test problems in 
the water jar series. Group A solved 111 
problems correctly, whereas Group NA 
produced 94 correct solutions. Though the 
direction of difference is as perdicted, the 
difference is not statistically significant. 
More critical than the total number of 
correct solutions is the behavior on the 
first test problem. A correct answer to 
this problem indicates to the S that the 
solution methods of the training period 
may not be applicable to the ensuing 
problems and hence may aid him in over- 
coming whatever set was induced by the 
earlier series. Group A achieved 12 cor- 
rect answers; group NA, 9. Here too the 
direction is interesting, but the differences 
are not significant. 

The results are clearer when we com- 
pare the two groups in the number of 
other-than-training solutions they offered 
to the six test problems. Since there is 
abundant evidence that the sexes differ 
in their problem solving behavior, “sex” 
was added to “treatment” as a criterion of 
classification in all the analyses of variance 
reported? As can be seen in Tables 1 
and 2, Group A offered significantly more 
“other than training” solutions than did 
Group NA. We may infer then that train- 
ing in alternative solutions leads to more 
variable problem solving behavior, though 
it is not clear that the increased vari- 
ability yields more correct solutions. 

No time limit was set on any water 
jar problem. The instructions directed the 
Ss to work on a problem until either a 
satisfactory answer was achieved or until 
he thought that it was no longer profit- 
able to continue on that problem. The 
amount of time spent on a problem be- 
fore giving up was, then, a measure of 
perseverance. Table 3 gives the number 


?A correction factor for the unequal cell 
frequencies was applied to all of the an- 
alyses of variance. 
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TABLE 1 
Mean NUMBER or OTHER-THAN-TRAINING 
Sonutions TO Test WATER , 
JAR PROBLEMS 


Group A Group NA 
Male 9.33 7.15 
(12) (13) 
Female 11.42 7.91 
(12) a) 


Note.—Numbers in parentheses are the numbers of 
Ss in each cell. 


TABLE 2 
Summary or ANALYSIS OF VARIANCE OF 
Orner-THAN-TRAINING SOLUTIONS 
To TEST WATER JAR PROBLEMS 


Source of amet oa 
Sex 1 | 2.03 | 2.03 |1.22 
Treatment 1| 8.09 | 8.09 /4.87* 
Sex X Treatment | 1] 0.44 | 0.44 
Error 44 | 73.09 | 1.66 
*P < 05. 
TABLE 3 


MEAN PERSISTENCE (IN SECONDS) on TEST 
WATER Jar PROBLEMS 
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of Ss in each sex and group who gave 
up the problems before actually achiev- 
ing a correct solution and the time in 
seconds spent on the problems before 
giving up. The analysis of variance sum- 
marized in Table 4 indicates that boys 
persevered longer than girls and that 
Group A persevered longer than Group 
NA. The greater perseverance of boys 
confirms other findings about sex differ- 
ences in perseverance (10, 11). The 
training in multiple possibilities of solv- 
ing a problem likely taught the NA group 
that failure with one method did not 
exhaust the possibilities of success, so 
that they may profitably spend addi- 
tional time on the problem. This finding 
complements Robinsons’ (9), who reports 
that Ss who thought their chances of 
arriving at a correct solution to a problem 
were excellent spent more time on that 
problem than did Ss who were less con- 
fident. 

The first part of the experiment con- 
firms several of our expectations. There 
was a tendency for the group trained in 
alternatives to solve more problems cor- 
rectly. This group offered a greater va- 
riety of solutions to the test problems, 
and on those problems they could not 


Group A Group NA 
p solve they worked longer before they 
Male 343.43 178.11 gave up. 
: (7) (9) The puzzle problems were introduced 
Female Ei E i to test whether the effects of training 1” 
alternative solutions would transfer to ® 
TABLE 4 
Summary OF ANALYsIs OF VARIANCE oF PERSISTENCE 
on Test WATER JAR PROBLEMS 
Source df Sums of Squares Mean Square vi 
Sex 1 9495.53 9495.53 7.55* 
Treatment i 20777.78 20777.78 16.53** 
Sex X Treatment 1 4607.02 4607.02 3.66 
Error 30 37,705.80 1256.86 = 
"P< oL 


* P < 001. 
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novel set of problems. Each of the first 
eight puzzles were solvable by the use 
of, in each case, pieces numbered 1, 3, 5, 
7. The succeeding five puzzles required 
different pieces for their correct construc- 
tion. Does training in an earlier, different 
type of problem influence behavior on 
the new test problems? 

As with the water jar problems, the 

&roup reached more correct solutions 
than did their controls. They solved 61 
Problems correctly; Group NA solved 
49. On the first test problem, there were 
four correct solutions in the A group 
and none in the NA group. Though nei- 
ther of these differences is statistically 
Significant, their direction was predicted. 
__ If the set induced by the initial eight 
lgsaw problems is effective, the Ss should 
Continue to use the same numbered pieces 
Im the test problems, where they are now 
mappropriate. Table 5 gives the mean 
numbers of pieces with the same numbers 
as those used in the training puzzles for 
each group. The analysis of variance for 
hese data is summarized in Table 6. The 

Stoup evidenced greater variability by 
using fewer incorrect pieces than did 
the NA group. This finding permits us 
.° Conclude that the ability to overcome 
™appropriate problem solving sets is ac- 
pared through alternative training in so- 
ution methods to the same problems 
and that this ability is transferable to 
Problem situations other than the one 
n which the alternative training was 
Acquired, 


Discussion 


ited impressions about the water jar 
ger, €ms peripheral to our interests, but 
5 Mane to Luchins’ (5) results, occurred 
vilh during the experiment. Luchins, it 
bingk, Tecalled, creates a set to use com- 
rai of three water jars during his 
Seri ing problems and during the test 

°S Many Ss are not able to use simpler, 
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TABLE 5 
Mean NUMBER or Samu-as-TRAINING 
Pieces (PERSISTENCE) USED IN 
SoLvING PUZZLES 


Group A Group NA 
Male 2.33 3.08 
(12) (13) 
Female 1.83 3.18 
(12) (11) 
TABLE 6 


SUMMARY OF ÅNALYSIS OF VARIANCE OF 
NUMBERS or SAME-AS-TRAINING PIECES 
Usep IN SoLVING PUZZLES 


Sum of | Mean of 


Source df |Squares| Squares 


Sex 1| -04| .04 — 

Treatment 1} 1.10) 1:10 1112:87” 

Sex X Treatment] 1| .09| .09 — 

Error 44 | 3.91 | .089| — 
*P < 001. 


two jar solutions. The interpretation is 
that set precludes Ss from seeing the 
easier solution method. In some instances 
in the present study it was not that the 
Ss did not see the simpler solution, but 
rather they did not understand that they 
were permitted to use two jars. Some of 
our Ss actually tried and abandoned cor- 
rect two jar solutions because, as they 
explained it, they thought that since the 
initial problems involved three jars, the 
continued use of all three was a part of 
the task. It may be therefore, that in 
the previous water jar studies, there were 
people who were actually aware of the 
simpler solution but were not clear that 
it was acceptable. 

The pedagogical implications of our 
findings are apparent. Although it in- 
volves an extrapolation from laboratory 
to classroom conditions, we might expect 
that a teacher’s consistent training in 
alternative solutions to problems might 
result in efficient overcoming of set. 
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Put into historical perspective, the find- 
ings of this study are congruent with 
Woodworth’s general factors theory of 
transfer. Apparently, what occurs when 
the child is given even brief training in 
solving problems by more than one 
method is that he develops a “general 
approach to novel problems.” Colloqui- 
ally, this approach is self-instruction to 
abandon an inefficient solution method 
and to try something new. One wonders 
whether this general problem solving skill 
was verbalized by our experimental Ss 
or whether it operated as an unverbalized, 
functional concept. Although we have no 
data on this point, we might expect that 
the verbalization of the principle would 
increase its efficiency. 


SUMMARY 


Two groups of sixth grade children were 
trained to solve 10 water jar problems. 
One group was given a single solution 
to the problems; the second was taught 
that the problems were solvable by either 
of two methods. On a succeeding set of 
test problems which necessitated solutions 
different from the two training methods, 
the alternative group tended, though not 
significantly, to solve more problems cor- 
rectly, offered significantly more other- 
than-training solutions, and persevered 
longer on problems they were unable to 
solve. 

To test the transfer effects of the orig- 
inal training, Ss were given 13 jigsaw 
puzzles: the first eight were all solvable 
by a single method so that a set might 
occur; the final five required various so- 
lution methods. Here, also, the Ss who 
were trained in alternative methods on 
the earlier water jar problems evidenced 
greater problem solving variability by 
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using significantly more pieces of the 
puzzles which were not used in the train- 
ing series. 
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THE EFFECTS OF DIFFERENT TEACHING METHODS: 
A METHODOLOGICAL STUDY?! 


MARVIN NACHMAN? AND SEYMOUR OPOCHINSKY 
University of Colorado 


_ Reviews of teaching research have con- 
Sistently concluded that different teaching 
Procedures produce little or no difference 
in the amount of knowledge gained by the 
Students (1, 2, 3, 5, 6). This same con- 
clusion has been reached despite the fact 
that experimenters have employed a wide 
variety of independent variables, such as 
lecture versus discussion classes, instruc- 
tor-centered versus student-centered 
classes, large versus small classes, various 
ty Pes of TV classes, etc. These results are 
Surprising if one considers that much of 
ths Tesearch was instigated by the hypoth- 
esis that differences would be found. Fur- 
thermore, it appears as if most educators 
Still assume that classroom techniques do 
M fact have specific effects. Why then have 
differences not been found? 
it One obvious hypothesis is that the teach- 
'§ methods which have been employed 
Bed not sufficiently distinct to produce sig- 
TN differences in the amount of knowl- 
is pa acquired. The purpose of this paper 
i © examine an alternative hypothesis, 
oly, that the different teaching meth- 
ea have, in fact, produced differential 
$ Ounts of learning but that these effects 
ve been masked in the measurement 
Process, 
Typically, in measuring the effectiveness 
m afferent teaching methods, one of the 
Detfor dependent variables has been the 
exa Ormance of the students on the final 
“mination. It is clear that variance on 
i; final examination is due to many fac- 
Bisha addition to the specific teaching 
Ods employed. Such things as the in- 
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tellectual ability of the student, his moti- 
vations, the amount of studying he has 
done outside of class, various personality 
factors and environmental pressures, etc. 
will also affect his performance. In re- 
search on teaching methods, most of these 
other variables are not controlled (usu- 
ally, the major control is to equate stud- 
ents for ability) and, since they undoubt- 
edly account for a significant proportion of 
the variance, it is perhaps not surprising 
that the diverse experiments on teaching 
methods have so consistently concluded 
that there are no significant differences as 
a function of teaching method employed. 

A basic difficulty of this conclusion stems 
from the fact that students prepare for 
examinations by studying outside of class. 
It is perfectly possible that two groups 
taught under different procedures may 
learn very different amounts in class, but 
when both then engage in significant extra 
amounts of study, the difference in per- 
formance becomes negligible. This is es- 
pecially true since most final examinations 
are based, at least in part, on information 
to be found in the student’s textbook. 
Thus, if both groups study the textbook 
equally, one might expect that they will 
do about equally well on the examination 
even though one group has learned more 
in class. The matter can, of course, be more 
complex. It may be, for example, that if 
Group A learns more in class than Group 
B, then Group B feeling itself less well 
prepared for the pending examination will 
study more than will Group A. That is, 
one might find that students in preparing 
for an examination, study until they feel 
they know the material to a certain de- 
gree, and if one group learns the material 
in class they will feel less need for addi- 
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tional study than the group which has 
not learned the material in class. 

This discussion suggests that to evalu- 
ate a particular teaching method effec- 
tively, testing of the students ought to oc- 
cur immediately after the method has been 
employed thereby avoiding contamination 
by the additional variable of outside study. 
This procedure would be a more direct 
test of the influences of the teaching situ- 
ation per se and would therefore be more 
likely to reveal its effects, even if these 
effects are not observable on a final exam- 
ination. 

The following experiment was designed 
to test this general prediction. The inde- 
pendent variable of class size was em- 
ployed and it was hypothesized (a) that 
students in a small class would do better 
than students in a large class on quizzes 
for which they had not prepared, and (b) 
that no differences would be found be- 
tween the classes when students were given 
an opportunity to study for an examina- 
tion. The experiment was not specifically 
concerned with the variable of class size. 
Rather, it was used because it offered a 
convenient and simple way of testing the 
hypotheses and because it has so consist- 
ently been shown to have no influence on 
the amount learned by students when tra- 
ditional measurements have been em- 
ployed. In one review on the effects of class 
size, the author concludes that there have 
been “... more than 200 studies which 
clearly reveal that there are no consistent 
differences” (4). 


MerHop 
Subjects 


Two sections of about 150 students each 
were enrolled in the senior author’s Gen- 
eral Psychology course. The day after the 
students returned from their Christmas 
vacation, they were informed that for the 
last two weeks of the semester the instruc- 
tor intended to meet a small class which 
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would be identical in all respects to the 
larger classes except for size. The class 
was arranged for Monday, Wednesday, 
and Friday at 8:00 A.M. and 42 of the 
300 students volunteered to attend that 
class rather than their large one. These 
students were divided into two groups of 
21 students and each student in one group 
was very closely equated with a student 
in the other group on the basis of the three 
hourly examinations they had previously 
taken in the course that semester. By 4 
flip of a coin, one group of 21 was then 
selected for the small class and the other 
group remained in one of the two large 
classes. Thus, the Ss for the experiment 
consisted of two equated groups of 21 
students each, one group in a small class 
of 21 students and the other group iD 
large classes of about 140 students. 


Procedure 


Since the experiment was not concerned 
with testing the unique advantages of a 
small class versus a large class, the teach- 
ing method used in the different classes WS 
as nearly identical as possible. The sm: 
class met on MWF at 8:00 A.M. and the 
two large classes met on MWF at 10:00 
and 11:00 A.M., respectively. It was at- 
ranged that the small class meet earlier 10 
the day than the large classes so that, 10 
the event the students communicated 12- 
formation to each other about examina- 
tions, the bias introduced would operate 
against the hypotheses. (Questioning ° 
the students after examinations revealed 
that the amount of communication was 
just about nil.) 

All three sections were conducted by the 
lecture method and, as much as possible, 
the lecturer repeated exactly the 519%? 
material to the three classes. During tho 
last 10 minutes of the second and fourth 
lectures, the students in all three class? 
were given a “pop-quiz” which specifically 
covered the material that was presented r 
that lecture and the previous lecture- Th 
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students had typically received a few 

Weeks notice for examinations before this 
time and were clearly not expecting and 
had not prepared for these quizzes. 

, In order to avoid biasing the lectures 
ìn favor of the quiz material, the quizzes 
Were constructed by the junior author 
and were not seen by the lecturer until 
after they were administered. Both quizzes 
Were multiple choice, the first containing 

15 questions and the second 8 questions. 
(The second quiz was originally designed 
as 15 questions but since only 8 of the 
items were actually covered in the lectures, 
the rest were discarded before the quizzes 
Were scored.) 

_ The students took the final examina- 
tion about one week after the last day of 
lasses. The final examination consisted of 
125 multiple choice items and was made 
Up by several members of the department. 

t was administered at the same time to 
about 1,000 students, of whom 42 were 
the Ss for this experiment. Twenty-five 
of the items on the final examination cov- 
ered the same material which had been 
Presented while the students were in the 
large versus small class experimental situ- 
ation. The two groups of 21 students were 
compared in their performance on the two 
Quizzes and on these 25 items of the final 
examination, 


Resutts 


ae two quizzes and the final examina- 
n Were scored by assigning one point 
E cach correct answer. For each student, 
total quiz score was obtained by adding 
oe Scores of his two quizzes, and differ- 
be Scores were then computed for each 
ar ched pair of students in the small and 
eni © classes, Because 13 of the 42 stud- 
S were not present for one or the other 

e two quizzes (seven students from 
small Tge classes and six students from the 
Stude: class), the difference scores for these 
E nts were based on the one compara- 
quiz which both members of the 


the ] 
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matched pair had taken. The difference 
scores revealed that 13 of the small class 
studeats did better than their matched 
pairs in the large classes whereas only 3 
did worse and there was no difference for 
5 students. The differences analyzed with 
a t test for matched pairs resulted in a ¢ 
of 2.57, which for 20 degrees of freedom 
has a probability level of less than .02. 

An alternative method of analyzing the 
results was to correlate the two quizzes 
(the Pearson product-moment coefficient 
was .56) and to use the regression equa- 
tion to predict the scores on the quizzes 
missed by the absent students. This re- 
sulted in two quiz scores for each student 
which were added. The mean of this sum 
was 16.02 for the small class as compared 
to 14.34 for the matched students in the 
large classes, The standard error of the 
difference between the means was .56 
which resulted in a ¢ of 3.00 which for 
seven degrees of freedom (since 13 of the 
scores were predicted) also has a prob- 
ability level of less than .02. 

On the 25 items of the final examina- 
tion, 10 of the small class students did 
better than their matched pairs in the 
large classes while nine did worse, and 
there was no difference for two students. 
The mean for the small class was 18.00 as 
compared to 17.53 for the matched stud- 
ents in the large classes. The standard 
error of the difference between the means 
was .90 which resulted in a ¢ of .53 which 
is not significant. 


Discussion 


The results confirmed the hypotheses 
that differential performance would be 
found on quizzes which specifically cov- 
ered classroom material and for which the 
students had not prepared but that per- 
formance would be equal on final exam- 
inations for which the students had de- 
voted a large amount of extra study. One 
cannot be certain, of course, about the 
role attributed to extra study. The spe- 
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cific conclusion which can be drawn from 
the data is that in order to test the effect 
of a particular classroom technique,eval- 
uation should be done immediately after 
the technique is employed. Waiting until 
a final examination (or any other an- 
nounced examination) confounds the prob- 
lem by permitting many other variables to 
operate, one of the most obvious of which 
is extra study. 

In most experiments on learning, when 
one is measuring the effects of a particular 
variable, e.g., massed versus distributed 
practice, the learning opportunities of the 
Ss are limited almost exclusively to the 
experimental situation. This has not been 
the case in experiments on teaching meth- 
ods, perhaps because of the difficulties of 
controlling some of the extraneous vari- 
ables. It may be, however, that many of 
the experiments on teaching methods 
would have led to significant differences if 
the evaluation procedure avoided contam- 
ination by such factors as extra study. The 
fact that in previous studies, the variable 
of class size has so repeatedly been found 
to produce no significant effects on amount 
learned, and yet yielded significant differ- 
ences in the present experiment, implies 
that a restudy of other teaching variables 
using different measuring techniques might 
be more fruitful than it has previously 
been. 

Although the experiment was not pri- 
marily concerned with the effects of small 
versus large classes, the data indicate that 
students in the small class learned signifi- 
cantly more in class than did the students 
in a large class. The difference of about 1.7 
points on a test which had a mean score 
of about 15 was fairly large considering 
the relatively small variance of the differ- 
ence scores and the fact that the small 
versus large class variable was limited to 
only two lectures per quiz. Furthermore, 
there was no attempt to utilize the unique 
advantages of a small class by permitting 
more questions or discussions. 
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What was the specific variable operating 
in the large and small classes which pro- 
duced the difference? As Buxton (2) has 
pointed out, class size, like time, is not a 
variable by itself but rather is an abstrac- 
tion in which other variables may be per- 
mitted to operate. In evaluating the course 
at the end of the semester, almost all of 
the students in the small class spontane- 
ously commented that they found it easier 
to pay attention in the small class and 
that they had become more interested. 
Casual observations by the lecturer cor- 
roborated this. The students in the small 
class appeared to be much more alert in 
their listening as well as note-taking be- 
havior. Rarely, if ever, were they observed 
in behaviors which are not uncommon in 
very large classes, such as talking to each 
other, staring out the window, reading, ete 
Undoubtedly, the very proximity of the 
lecturer in the small class acts as an in- 
hibitory influence on these behaviors. 

It is also possible, of course, that the 
variable of class size had nothing to 4° 
with the obtained differences. It may be 
that something like a “Hawthorne-effect 
was operating in which the students in the 
small class felt more significant and more 
highly motivated because a special class 
had been established for them. This poss!” 
bility, and similar ones, such as uninten- 
tional lecturer bias, or that students Pe’ 
haps learn more at 8:00 a.m. than at 10 0 
or 11:00 am., do not reduce the sign” 
cance of the major finding, namely th® 
differences as a result of teaching tech- 
nique manipulations were found on “por 
quizzes” but not on a prepared-for fina 
examination. 


Summary AND CONCLUSIONS 


Twenty-one students in a small d 3 
were compared on examination perfor 


. $ 
ance with a matched group of studen 
who were in a large class. It was hypoth” 
sized that the small class would do bett® 


s a e 
on quizzes which specifically covered th 


ass 
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classroom material and for which the stud- 
ents had not prepared but that the two 
groups would do equally well on final ex- 
aminations for which they had studied. 
The hypothesis was confirmed and the im- 
plications of this methodological procedure 
were discussed in relation to other research 
on the effectiveness of different teaching 
methods. 
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CURRICULAR DIFFERENCES IN JOB INCENTIVE 
DIMENSIONS AMONG COLLEGE STUDENTS 


A. W. BENDIG AND EUGENIA L. STILLMAN 
University of Pittsburgh 


In a recent study (1) the results of a 
preliminary attempt to identify the di- 
mensions of job incentives among college 
students was reported. By means of a 
factor analysis of rankings of a homogene- 
ous list of job incentives by college Ss three 
dimensions were isolated and tentatively 
identified as: (a) need achievement vs. 
fear of failure; (b) interest in the job it- 
self vs. the job as an opportunity for ac- 
quiring status; and (c) job autonomy of 
supervision vs. supervisor dependency. 
Simple procedures were developed for com- 
puting factor “scores” for individual Ss 
from the differences between pairs of 
ranked incentives. 

If the dimensions underlying the ranked 
incentives are related to choice of occu- 
pational goal among college Ss, we would 
expect that Ss divided into more homo- 
geneous subgroups on the basis of their 
curricular courses of study would show 
somewhat greater agreement in their rank- 
ing of the incentives than the combined 
curriculum heterogeneous total group of 
Ss and that the factor scores would be 
capable of demonstrating significant dif- 
ferences among the curriculum subgroups. 
Curriculum subgroup differences would 
have the added advantage of helping to 
define the meaning of the job incentive 
dimensions. 

The present research was designed to 
(a) identify, on an a priori basis, curricu- 
lar subgroups of Ss, (b) combine these 
subgroups into major curriculum group- 
ings by means of an empirical factor anal- 
ysis of their subgroup profiles in ranking 
job incentives, and (c) test whether there 
were significant differences among these 
major curriculum groupings on the de- 
rived factor scores. 


PROCEDURE 


A form was prepared which listed the 
eight incentives used previously (1) and 
it was administered to 267 undergraduate 
college Ss (174 men and 93 women) en- 
rolled in 10 sections of undergraduate psy- 
chology courses. The form requested the S 
to indicate his (her) name, age, sex, school, 
and curriculum group within the univer- 
sity, major subject and to write a brief de- 
scription of the job he expected to accept 
after graduation. The S was then requested 
to rank (from one to eight) the following 
list of incentives with the incentive that 
would be most important to him in se 
lecting the job previously described being 
ranked “one” and the least important 
incentive being ranked “eight.” The m- 
centives used were: 

. Opportunity to learn new skills 

. Friendly fellow workers 

. Freedom to assume responsibility 

Good job security 

Good prospects for advancement 

. Full insurance and retirement bene 
fits 

7. Recognition from supervisors for 

initiative 

8. Good salary 

The 267 Ss who completed this form 
were dichotomized as to sex and further 
divided into eight major curriculum areas: 

Business Administration: 51 men. 

Engineering and Mines: 42 men. , al 

College B.A. (Humanities and Soo! 

Sciences): 35 men and 11 wome?- j 

College B.S. (Natural Sciences) : 30 me 

and 11 women, 

Pre-Eduecation: Elementary; 26 worth 

Pre-Education: Secondary; 16 me? am 

25 women. 

Pre-Nursing: 11 women. 


anp. 
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Nursing Education: 9 women. 
This division by sex and curriculum re- 
sulted in 11 curriculum subgroups. 

Two sets of “scores” were available for 
each S. The eight rankings of the incen- 
tives constituted an incentive profile for 
the S and these rankings were averaged 
for each of the 11 subgroups. A second set 
of factor “scores” had been developed in 
‘a Previous study (1). The Factor A 
Score was defined as the ranking of Incen- 
i 6 minus the ranking of Incentive 1. 
eel the Factor B score was Incen- 
: minus Incentive 2 and Factor c 
tive neentive 7 minus Incentive 4, Posi- 
ae Scores computed in this manner indi- 
oa (a) high need achievement, (b) high 
5 es in the job itself, and (c) high need 
came from supervision. Negative 
fe T Scores presumably measure (a) high 

ar of failure, (b) strong attitude toward 
ak i aS a stepping-stone for advance- 
a and (c) high need for a dependency 

on to the job supervisor. 
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The average (mean) incentive profiles 
for eath of the 11 curriculum subgroups 
can be found in Table 1. From the sums 
of ranks of the eight incentives for each 
subgroup the average rank-difference 
(rho) intercorrelation among the Ss con- 
stituting each subgroup was computed by 
the usual formula (2, p. 421) and these 
coefficients are given in the last column 
of Table 1. Nine of the 11 coefficients are 
significantly different from zero at the .01 
level of confidence with the remaining two 
coefficients being significant at the .05 
level. The average rho intercorrelation for 
the total group of 267 Ss (ignoring cur- 
riculum subgrouping) was .20 (significant 
at the .01 level). The median average inter- 
correlation of the 11 subgroups in Table 
1 is .28 which can be compared to the 
average intercorrelation of .20 when the 
curriculum subgroups are combined, indi- 
cating that the division of Ss into curricu- 
lum subgroups did somewhat increase the 
homogeneity with which the Ss ranked 


TABLE 1 
Mean RANKS or INcENTIVES AND AVERAGE INTRAGROUP 
CORRELATIONS (RHO) ror CURRICULUM SUBGROUPS 


Incentives 
Curriculum Subgroups | Se Si Eene 
TOUP |Students | 4 3 4 5 6 7 g | relation 
“siness Adminis-| m | 61 | 5.6| 5.3] 4.6] 4.1] 1.8 | 6.7 | 4.9 | 3.0 | .38** 
Tation 
Mineeting & M 42 |4.2 | 4.8 | 4.8 | 4.3 | 2.6 | 7.0 | 5.1 | 3.2 | .26** 
GQ es 
once’ B.A, m | 35 |5.3| 5.1] 4.1]3.8] 3.2 | 6.3 | 5.3 | 3.0 | 21 
Conese B.A. r | un |41/3.9] 2.7/4.4] 46]6.8] 4.5] 5.0] .14* 
once B.S. Mm | 30 |4.6/5.4|3.1|2.8/3.5]6.5| 5.9] 4.2 | oge« 
Pre 18° BS. r | u }26/5.0)3.5|3.5|3.9] 7.5] 5.5| 4.4 | sox 
mep ation Ele- | F 26 |3.7 | 4.0 | 3.1 | 2.6 | 5.4 | 6.0 | 6.3 | 4.9 | .28** 
Proitary 
on gducation: Sec- m | 16 |48 ]4.5]|4.2]3.1]3.7]6.4]|4.9 | 4.4] .10* 
Pre dary 
ongtteation: Sec- F a5 |3.7| 4.3] 2.7 | 2.6 | 5.3 | 7.4 | 5.4] 4.6 | .39* 
Pien Y, 
Nur Using r | a [3.9 ]3.5]2.5 |43 [3.7] 6.8 | 6.6 | 4.7 | .33* 
ng Education | F 9 |23 ]6.2 |26 |44 |33 | 7.2 | 5.7 | 4.2 | aoe 
E = o, 
P= o. 
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the incentives. The median intercorrelation 
for the five male curriculum subgroups 
was .26, while the median coefficient for 
the six female subgroups was .32, sug- 
gesting that the female Ss were slightly 
more homogeneous in ranking the incen- 
tives than were the male Ss. However, 
the 11 coefficients were ranked and the 
Mann-Whitney U test (4, pp. 116-127) 
was used. The two-tailed probability was 
approximately .21 which indicates that 
the hypothesis of similar homogeneity 
among men and women Ss in ranking the 
incentives cannot be rejected. Two of the 
three smallest average intercorrelations 
were contributed by the college B.A. sub- 
groups and it is suggested that in future 
curriculum research the two major samples 
of Ss within these subgroups, majors in 
the humanities and in the social sciences, 
be treated separately in finding incentive 
profiles. The intercorrelation among the 
Pre-Education-Secondary men might be 
increased by splitting these Ss in future 
samples into those majoring in physical 
education and Ss majoring in more aca- 
demic high school subjects. 

In order to combine the 11 curriculum 
subgroups into more meaningful major 
groupings by means of factor analysis the 


eight incentive mean ranks for each of 
the 11 curriculum subgroups were ranked 
and the intercorrelations among the 11 sets 
of ranks were computed by the rank- 
difference method. The mean correlation 
among the subgroups was .60 with indi- 
vidual coefficients ranging from .01 to .98. 
Nineteen of the 55 intercorrelations were 
significant at the .05 level of confidence. 
This matrix was factor analyzed by the 
usual centroid method and three orthogo- 
nal factors were extracted. The median 
absolute value of the residuals after ex- 
traction of the third factor was only 03, 
indicating the absence of a fourth factor. 
The original factors were rotated to simple 
structure and the rotated orthogonal fac- 
tor loadings for the curriculum subgroups 
can be found in Table 2. 

The 11 curriculum subgroups appeared 
to condense into six major groupings on 
the basis of their patterns of loadings on 
orthogonal Factors X, Y, and Z, and their 
factor pattern groups (I to VI) are indi- 
cated in Table 2. Factor X, which ac- 
counts for 40 per cent of the variance 
among the subgroups, is obviously a sex 
factor with the male subgroups all having 
high loadings on this factor and the female 
subgroups showing moderate or low load- 


TABLE 2 
Factor ANALYSIS oF RANK INTERCORRELATIONS AMonG CURRICULUM SUBGROUPS 
Factor Loadin; 

: Sex [Number of Bs 
Bena Curriculum Subgroups Group Students ~r h? 

X 3 Z 
I College B.A. M 35 96 2) B 95 
Business Administration M 51 94 —02 | 10 89 
II Engineering & Mines M 42 74 09 | 58 89 
II Pre-Educ: Secondary M 16 90 42 | 00 99 
College B.S. M 30 80 54| 14 95 
IV Nursing Education F 9 37 44| 71 83 
College B.S. F 11 37 66 | 58 91 
Vv Pre-Educ: Secondary F 25 35 89| 13 93 
Pre-Educ: Elementary F 26 33 86 | 13 87 
Pre-Nursing F 11 31 76) 1 69 
VI | College B.A. F 11 00 91 | 00 83 
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ings. Factor pattern Group VI is similar 
to Group V except for an extremely low 
loading on the “masculinity” Factor X 
which appears to justify the separation of 
Group VI (College B.A. women) from the 
three female subgroups comprising Group 
V. Comparing the men and women Ss on 
the mean ranks they assigned to the indi- 
vidual incentives indicated that the men 
Preferred Incentives 5 (Good prospects 
for advancement) and 8 (Good salary), 
while the women reported that Incentives 
1 (Opportunity to learn new skills) and 
3 (Freedom to assume responsibility) 
would be more important in their job 
selection, 

Factor Y (37 per cent of the intersub- 
group variance) appeared to be almost 
the reverse of the “masculinity” Factor 
X with the exception that the male sub- 
Sroups within Group III (Pre-Education: 
Secondary and College B.S.) had moderate 
loadings on Factor Y (median loading = 
48) that were intermediate in size be- 
tween the loadings of the six female sub- 
Stoups (median loading = .81) and the 
Temaining three male subgroups (median 
loading = 09), The factor pattern groups 
Were trichotomized as to their loadings on 
Factor Y with Groups V and VI being 
high (median = .88), Groups III and IV 
ing intermediate (median = .49) and 

Toups I and II having low loadings 
(median = .09). Incentives 3, 5, and 8 
APpeared to be related to Factor Y with 
Subgroup preferences for Incentive 3 be- 
mg Positively correlated with Factor Y 
oadings and Incentive 5 and 8 preferences 
nag negatively related to the loadings. 

‘ctor Y might be labelled a “tender- 
Minded social service” ys. a “toughminded 
pan dimension from an inspection 
ie curriculum subgroup loadings along 
_ ‘actor. However, “naming” of this 
oo is less important than is the use 
a the factor loadings in empirically 

“Stering the subgroups. 
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Factor Z (12 per cent of intersubgroup 
variance) was a triplet factor with the 
three subgroups included in Factor Pattern 
Groups II and IV showing large loadings 
and the remaining eight curriculum sub- 
groups showing zero loadings. Groups II 
and IV seem to prefer Incentive 1 more 
than did the remaining groups, but com- 
parisons of the high and low Factor Z 
groups showed slight differences on their 
mean rankings of the other incentives. 
We might suggest that Factor Z repre- 
sents an “interest in science and technol- 
ogy” factor, but again it should be empha- 
sized that identification of this factor by a 
name was not an aim of this study. 

Three factor scores had been computed 
for each of the 267 Ss in the 11 curriculum 
subgroups and three analyses of variance, 
one for each factor score, were computed 
to test for significant differences among 
the subgroups. The total variance (df = 
266) of a factor score was first split into 
the variance between the means of the 
11 subgroups (df = 10) and the residual 
individual differences variability among the 
Ss within the subgroups (df = 256). The 
11 within subgroup variances were tested 
for homogeneity by Bartlett’s chi-square 
technique before pooling and the chi- 
square values gave no evidence of signifi- 
cant heterogeneity for any of the three 
factor scores. The between subgroups vari- 
ance was further divided into two compo- 
nents: between factor pattern groups 
(df = 5) and the residual variability 
among the means of the subgroups within 
the factor pattern groups (df = 5). The 
summaries of these analyses of variance 
can be found in Table 3. All three factor 
scores discriminated among the subgroups 
at the .01 level of confidence. The differ- 
ences among the factor pattern groups 
were significant at the 01 level for all 
three scores, while the residual variation 
among the subgroups comprising the pat- 
tern groups did not approach Statistical 
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TABLE 3 
ANALYSES OF VARIANCE oF CURRICULUM SUBGROUP 
DIFFERENCES IN Factor Scores 
Factor A Factor B Factor C 
Sources of Variation df es F as Mean : 
Square Square Square 
i 10 33.98 | 3.80**| 39.69 | 4.67**| 31.37 | 3.05** 
a i 5 62.09 | 6.95**| 73.32 | 8.63**| 52.47 | 5.13** 
Subgroups within Patterns 5 5.86 | .66 6.05 | .71 | 10.01} .97 
Within Subgroups 256 8.94 8.50 10.29 


** Significant at the .01 level. 


significance. Thus all of the variation in 
mean factor scores could be accounted for 
by the grouping of the curriculum sub- 
groups accomplished through the factor 
analysis of subgroup profiles reported 
above. 

The intercorrelations among the factor 
scores were computed for each of the 11 
subgroups and averaged (weighted mean). 
The average intercorrelations (df = 245) 
among the factor scores were: A and B, 
r = .16 (significant at the .05 level); A 
and C, r = —.24 (significant at the 01 
level); and B and C, r = —.05 (not sig- 
nificant). Although two of the three fac- 
tor score intercorrelations are statistically 
significant for this large sample of Ss, all 
three are low enough to indicate that the 
scores are relatively independent measures 
of job incentive dimensions. 

The mean factor scores for each of the 


TABLE 4 


Mean FACTOR SCORES or THE CURRICULUM 
FACTOR PATTERN Groups 


3 foot. N Factor AlFactor B|Factor C 
I 86 1.06 | —2.17 1.06 
IL 42 2.69 —1.55 -76 
EEL 46 1.87 — 480) | 2.03 
IV 20 4.90 —1.25 | 1.60 
v 62 2.95 8 | (3.11 
VI 11 2.73 1.09 -09 


pattern groups were computed and are 
given in Table 4. For Factor A, Group 
IV appears to be quite high in “need 
achievement” while Groups I and III 
show a high degree of “fear of failure.” 
Groups V and VI have mean factor scores 
(Factor B) indicating strong interest in 
the job, while the other groups, particu- 
larly Group I, view the job as a means of 
advancing their own status and position. 
The average Factor C scores show Groups 
III and V to prefer jobs that are free of 
close and immediate contact with super- 
vision and Group VI Ss express a need 
for a strong and intimate dependency re- 
lationship with their supervisors. 


Discussion 


From one point of view the present 
study had two questions: (a) whether the 
Somewhat crude factor “scores” devel- 
oped in the previous study (1) were valid 
in discriminating differences between sub- 
groups of Ss who had chosen different 
college curricula, and (b) whether a fac- 
tor analytic grouping of the curriculum 
subgroups would account for a sizable 
tersubgroup variability 
and provide information 
nsions of job incentives 
able 1 indicates that sev- 
ere originally made in de- 
med) homogeneous curricu- 
- Both the men and women 


for college Ss. T: 
eral mistakes w 
fining the (assu 
lum subgroups, 
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college B.A. subgroups should have been 
further split to achieve more homogeneity ; 
probably by separating Ss majoring in the 
Humanities from those majoring in the 
Social Sciences. Similarly, the men in the 
Pre-Education subgroup should be divided 
Into more similar subgroups. However, 
these errors in the a priori subgrouping 
of the Ss could not be corrected after the 
Table 1 results were known without leav- 
ing the study open to the charge of data 
Manipulation. These subgrouping errors 
appear to have had little effect on the 
answers to the original two questions. 

f The 11 subgroups condensed nicely into 
six broader groupings on the basis of the 
factor analysis reported in Table 2. Al- 
though we labelled the three obtained 
Factors X, Y, and Z, as “masculinity,” 

tenderminded social service vs. tough- 
minded practicality,” and “interest in 
Science and technology,” it must again be 
emphasized that we are not convinced that 
there are adequate appellations and that 

naming” of the factors was unimportant 
for our purpose. We were interested solely 
Mn the use of factor analysis here to allow 
uS to combine curriculum subgroups into 
Saagi parsimonious set of major group- 
gs. 

The answers to the two major research 
questions are found in Table 3. The fac- 
tor scores showed differences among the 
x curriculum subgroups that were highly 
Significant, Parenthetically it might be 
Poted that the data given in Table 3 al- 
Ne the computation of epsilon correla- 
ae Coefficients to provide rough indices 

the “validity” with which the Factor 
co and C scores discriminated differ- 
nae Among the subgroups. Epsilon is a 

Vilinear correlation method similar to 
ier correlation coefficient included in 
he Statistics texts and is based upon 

3 Tatio of the variance of the means of 
sub Subgroups to the variance within the 
coef PS (3, pp. 319-324). These validity 

cients were .31, .35, and .27. The 
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average intercorrelations among the factor 
scores were also low enough to estimate 
that each score provided some independent 
information about subgroup differences. 
Table 3 also shows that practically all 
of the subgroup differences in scores were 
accounted for by the differences among 
the factor pattern groups. Another way of 
saying this is that the curriculum sub- 
groups included within the six larger 
groupings were quite homogeneous in fac- 
tor scores. Curriculum differences account 
for approximately 10, 12, and 7 per cent 
of the inter-subject variability in factor 
scores with the six factor pattern groups 
accounting for practically all of this cur- 
riculum-related variance. 

It seems fair to conclude from this pre- 
liminary study that (a) the method of 
factor analysis provides a technique by 
which the dimensions of job incentives 
can be isolated and relatively independent 
factor scores can be estimated for indi- 
vidual Ss, and (b) these factor scores are 
significantly related to college Ss’ choices 
of academic curricula. However, further 
expansion and delineation of the factors 
underlying job incentives is needed with 
a consequent development of additional 
factor scores that can be estimated more 
reliably. 


SUMMARY 


A list of eight job incentives was ranked 
by 267 college Ss and the Ss were divided 
into 11 subgroups on the basis of sex and 
college curriculum variables. A factor anal- 
ysis of the subgroup incentive profiles iso- 
lated three factors (tentatively labelled 
“masculinity,” “tenderminded social ys. 
toughminded practicality,” and “interest 
in science and technology”) and condensed 
the 11 subgroups into six major curricu- 
lum groupings on the basis of the subgroup 
patterns of factor loadings. Scores on three 
factors isolated in a previous factor analy- 
sis of the incentives were computed for 
individual Ss and significant differences 
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(.01 level) were found among the six correlation coefficient. Psychometrika, 
factor pattern groups for all three factor 1952, 4, 421-428. 
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RELATIONSHIP OF INTELLIGENCE AND SOCIAL POWER TO 
THE INTERPERSONAL BEHAVIOR OF CHILDREN! 


ALVIN ZANDER AND ELMER VAN EGMOND 


Research Center for Group Dynamics, University of Michigan 


There are contradictory beliefs about 
_ the behavior of highly intelligent children 
In school, and particularly about their 
Participation in problem-solving discus- 
Sions. These children are described as 
both influential and impotent, tolerant 
and impatient, supporting and rejecting, 
eager and bored. Little information exists 
Which can help us to separate fact from 
faney among these assertions, nor do we 
‘now much more about the ways in which 
ntelligent persons differ from less intelli- 
Sent ones in these respects. 
3 peist is a reason for this lack of in- 
‘mation. Intelligence as a concept is 
Primarily intended to describe an ability 
to deal with cognitive problems. There 
àre few elements in definitions of intelli- 
Bence which suggest that variations in in- 
peste ability are associated with varia- 
ons in face to face behavior. Hence, 
intelligence has seldom been used as an 


1 M r à : 
ndependent variable in studies of social 
ehavior, 


i 


s In a decision making group, however, 
tines with high intelligence may per- 

ig offer Wiser observations than one with 
User intelligence. Because of his greater 
ri 5 nes, We can anticipate that a 

te ty child would be more influential 

More 2 less intelligent one. Because he is 
cial po Pert, he should have greater so- 
TR, ower, that is, a greater ability to in- 
ike a others, On the basis of assumptions 
can b ese, it is apparent that intelligence 
© a cause for particular types of inter- 


e: 
Persona] behavior. Some children, regard- 
f *The Te 
°rmed 
| Uniteg 
d ©) 


Search reported herein was per- 
Pursuant to a contract with the 
States Office of Education, Depart- 
Health, Education and Welfare. 
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less of their degree of intelligence are able 
to exercise strong influence on their peers, 
while others are consistently ignored by 
classmates. 

How then does a person’s intelligence 
affect the way he acts toward others when 
his group must reach a decision? Does he 
behave differently when he is used to hav- 
ing his ideas accepted (has high social 
power) than when he is used to being ig- 
nored? These are primary interests in this 
study. 

Boys and girls are expected and re- 
quired to behave differently in social set- 
tings. Boys, for example, are more often 
pressed than are girls to be concerned 
with achievement and influence. Because 
prescriptions for the two sexes differ, it is 
probable that the meaning and effects of 
intelligence or social power differ for boys 
and girls. A secondary purpose of this 
study is to examine the impact of intelli- 
gence and social power on the interper- 
sonal behavior of boys compared to girls. 

Toward these purposes, measures were 
made of the intelligence and social power 
of all children in a number of classrooms. 
These persons were then put in standard- 
ized, small, problem-solving groups. Their 
participation was observed in terms of pre- 
coded categories to see how those with dif- 
ferent degrees of power and intelligence 
differed in their behavior. Data were also 
obtained from teachers and classmates 
concerning characteristics of these children 
in regular classrooms. 


Major Issues 


In the small discussion groups, we as- 
med-children would behave in th 
aoe 5 " e Ways 
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they typically do in their classes to try to 
win acceptance for their ideas. 

It was expected that intelligence-should 
make a difference in the actions initiated 
by them, as already mentioned, because 
intelligent persons have more resources to 
offer in a problem-solving discussion than 
do less intelligent ones. Also, they might 
have more confidence in their own pro- 
posals, stemming from the ready accept- 
ance of them in the past. Highly intelli- 
gent pupils, therefore, were expected to 
make more efforts to influence others, to 
have their ideas accepted more often than 
less intelligent pupils, and to behave in 
ways which could be taken as typical of 
persons with confidence in themselves. 

Children with greater social power were 
expected to make more efforts to influence 
others, to be more successful in doing so, 
and to behave in ways which have been 
shown in other studies to be typical of 
persons with greater power (2, 3, 4, 5, 
6). 

How do teachers characterize children 
who have different degrees of intelligence 
and social power? To determine this, 
teachers were asked to rate the behavior 
of Ss in categories roughly similar to those 
used in observing the behavior of the 
children in the small groups. What as- 
pects of the teachers’ opinions, based on 
day by day experience with these children, 
support or contradict the behavior shown 
in the standardized group situations? Were 
teachers able to differentiate between the 
behavior of one type of person and that 
of another? 

Finally, it is useful to know how class- 
mates characterized the children to whom 
they ascribed high social power as com- 
pared to those to whom they attributed 
little power. The Ss were rated by their 
peers concerning their ability in school- 
work, their attractiveness, and their abil- 
ity to coerce or threaten others. These 
personal qualities were considered to be 
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separate sources of social power as sug- 
gested by French and Raven (2). It was 
anticipated that persons with greater so- 
cial power, in contrast to those with less, 
would have these qualities ascribed to 
them more often by classmates. Do highly 
intelligent persons differ from the less in- 
telligent in these respects? 


METHOD 


Data used in this research were origi- 
nally collected by colleagues for a differ- 
ent but related purpose” In the original 
investigation, measures were obtained con- 
cerning all children in 16 second grade and 
16 fifth grade classrooms, representing all 
socioeconomic levels in a medium-sized 
city. Children were selected in that study 
for a field experiment concerned with the 
creation of changes in the social position 


and behavior of group members. The — 


measures were made in order to establish 
a baseline, so that the amount of change 
in a participant’s behavior could be de- 
termined. The data employed in the pres- 
ent investigation are from these pre-eX- 
perimental measures. 

From the original population of 638 
children on whom measurements were 
available, Ss were chosen for this study 
who had degrees of intelligence and power 
in required combinations. Pupils desig- 
nated as high in intelligence were those m 
the upper 33 per cent of their class a” 
those designated as low in intelligent? 
were in the lower 33 per cent. Childre? 
designated as high in power were in the 
upper 50 per cent of their class and thos? 
designated as low in power were in t 
lower 50 per cent of their class on this 
measure. The sample included 226 boy® 


*The collection of the original data be 
supported by Grant-in-Aid (M-919) fron 
the National Institute of Mental Healt’ 
National Institutes of Health, US publ 
Health Service. Ronald Lippitt was proie? 
director. We are grateful for his kindness ? 
furnishing the data for our use. 


Ay 
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and 192 girls, 230 second grade children 
and 188 fifth graders. 

The type of data available for each 
child and the source of each are as fol- 
lows: 

Intelligence, Intelligence scores were ob- 
tained from school records. They were based 
on the results of the Kuhlman-Anderson 
test, administered in a group form. 

atings by classmates. Every class mem- 

er rated every child in his class on four 
characteristics: social power (who can most 
piten get you to do things for him?), at- 
tractiveness, ability in school work, and 
Suity to threaten others. To obtain these 
atings, photographs of every child in the 
e assroom were printed on sheets with a 
ur-point rating scale next to each picture. 
served behavior in groups. The mem- 
A of a class were divided into four 
ion er groups on a random basis and each 
SRA Was sent to a corner of the room 
RÍ worked on assigned problems. A 
Ta i? observer was stationed in each cor- 
chien . recorded on a precoded observation 
foe ule the quantity and quality of behav- 
vora tiated by each child? Four problems 
required ee to the groups, each of which 
Progr ed @ group decision as a first step in 
Ohe opora completion of the task. In 
È ee lem, for example, the group built 
cision ney and in another arrived at 

b S to how many beans were in a 
a At the end of each task new groups 
ormed for the next problem, thus pro- 


Viding maxi i 
3 ximu i hild 
5 APRN m opportunity for each chi 


room with every other child in the 


ic, Chairmen wer ignated for these 
iscussiong ere designat 


1 * 
follow; observed behavior was coded into the 


1, Tea categories: 
ers uence attempts—efforts to influence 
Tegardless of the manner employed. 
in which influence attempts—efforts 
Others, compliance was obtained from 
in Sp rStecessful influence attempts—efforts 
Otherg, COMpliance was not obtained from 


-D . S 
a andine infivengs attempts cani 


T} a 
among” Observers had satisfactory reliability 
ability ea Information concerning reli- 
Mm R the Observations will be available 


> Which tpations concerning the project from 


ese data were borrowed. 
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ments made in an ordering or directing man- 
ner, implying autonomy for the actor. 

5. Suggestions—comments indicating weak 
proposals to, or requests of, others. 

6. Valuing of others—behavior indicating 
recognition of either high or low resource- 
fulness of another person in an area of 
knowledge or skill. 

7. Positively valuing others’ behavior— 
comments indicating recognition of high 
value in others’ behavior. 

8. Negatively valuing others’ behavior— 
comments indicating recognition of low value 
in others’ behavior. 

9. Affect-laden behavior—behavior in 
which overt friendliness or unfriendliness is 
observed which is not a direct attempt to in- 
fluence another. 

10. Aggressive behavior—acts of aggres- 
sion which are either inflicted or threatened. 

11. Mean weighted directness in style— 
ratio of frequency of forceful to nonforceful 
forms of behavior toward others, 

Perceptions by teachers. Every child was 
rated by his teacher on seven characteristics 
descriptive of the typical social behavior of 
the person in his schoolroom, The teacher 
was asked to indicate on a five-point scale 
the extent to which each child showed the 
behavior under consideration (from “hardly 
ever” to “most often”). The qualities rated 
are shown in Table 6. 

Teachers did not know how much social 
power classmates had attributed to the Ss 
at the time they made their own ratings. 
Information concerning the intelligence of 
these children was available to teachers in 
school records, 


Resuits 


Characteristics attributed to the Ss by 
their peers, for boys and girls with differ- 
ent degrees of Power, are first discussed, 
Results are then presented for (a) the ob- 
served behavior of Ss in the small groups 
and (6) the perceptions of Ss’ daily be- 
havior by teachers, Finally, the results 
are summarized and interpreted. 

For the sake of brevity the data in 
tables are usually limited to Statistically 
significant findings. The omission of Te- 
sults for a specific category of behavior 
indicates that no significant differences 
were obtained for it. 

At the outset it should be noted that 
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social power is not highly correlated with 
intelligence. The correlation for boys was 
.20 and for girls .28. The low relationships 
between intelligence and power indicates 
that they may vary quite independently. 
Since both of these correlations are sig- 
nificant at the .01 level of probability, 
however, it is also clear that brighter 
children tend to have more power than 
less intelligent ones. 


Characteristics Attributed to Subjects by 
Classmates 


Both boys and girls who were attributed 
high social power were more attractive to 
classmates than those low in power, re- 
gardless of their intelligence. These results 
may be seen in Table 1. Girls with greater 
power were rated as more able in school- 
work than girls low in power irrespective 
of their intelligence, while boys were de- 
scribed as more able in schoolwork to a 
significant degree only among those high 
in power and intelligence. Boys with higher 


TABLE 1 
MEANS OF CHARACTERISTICS ATTRIBUTED BY CLASSMATES TO 
CHILDREN WITH VARIED SOCIAL POWER 
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power were described as more threatening. 
This was not true for girls. 
Boys and girls with higher intelligence 
were seen by classmates as significantly 
more able in schoolwork than less intelli- 
gent persons (M 2.24 and 1.76, p of diff. = 
01). Girls high in intelligence were rated 
as more attractive than those low in intelli- 
gence (M 2.40 and 2.01, p of diff. = 005). 
Ability to threaten was not related signifi- 
cantly to intelligence. 


Benavior or Boys IN SMALL GROUPS 


We consider the types of behavior boys 
used in the problem solving discussions. 


Effects of Variations in Power 


Among highly intelligent boys, those 
high in social power were not significantly 
different in any category of observed be- 
havior from those low in power. : 

Among less intelligent boys, those high 
in power, compared to those low in powe", 
revealed a vigorous, inconsistent, and com- 


Boys Girls 
Children high in s 
intelligente Social Power , Social Power i 
High M | Low M High M | Low M La 

Attractive 2.13 1.52 6.42** 2.09 1.50 7.60** 
Able in school work 2.19 1.43 6.33** 2.14 1.58 7.18** 
Threatening 3.02 2.76 3.29** 2.89 2.86 .39 

N 58 38 73 42 
Children low in intelligence 
Attractive 2.21 1.59 6.59** 2.39 1.65 11.69%% 
Able in school work 2.44 1.73 1.15 2.55 1.76 16.28 
Threatening 2.91 2.78 1.86* 2.98 2.90 90 

N 46 84 26 51 

"p= 05. 


More 
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petitive pattern of behavior. The quanti- 
tative results may be seen in Table 2. 


Effects of Variations in Intelligence 


Among boys with high power, the be- 
havior of the more intelligent was in no 
Way significantly different from that shown 
by the less intelligent. 

Among boys with low power, those with 
Sreater intelligence, compared to those 
with less, were active, effective, and sup- 
Portive of others. These data are sum- 
Marized in Table 3. 


Benavior or GIRLS IN SMALL GROUPS 


Effects of Variations in Power 


Pees girls with high intelligence, those 
high power were not observed to be- 
ave differently from those with low 
Power, 
Ee girls who were less intelligent, 
la with high social power were only a 
e different from ones low in power. 
ose with more power were more often 


_ Successful in their influence attempts (M 


Sl and 5.61, p of diff. = 01) and were 
oth „Positive in their remarks about 
sers’ behavior (M 128 and .58, p of 
diff, = 01) 


E , 
fects of Variations in Intelligence 


Ri a . . 
Ply ares of their social power, varia- 


fie ae Intelligence were not associated 
gir] , Wnificant degree with differences in 
S observed behavior. 


Bruaviorn or Boys AND GIRLS 
Direcriy COMPARED 


tool direct comparison of the behavior 

er Y boys and girls, regardless of their 
Most oe intelligence indicates the actions 
behavin, Pleal of each sex. The following 
nificant, were observed among boys sig- 
attem e more often than among girls: 
en Pis to influence, successful influ- 

> Unsuccessful influences, aggression, 
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TABLE 2 


BEHAVIOR OF Less INTELLIGENT Boys 
VARYING IN SociaL POWER 


Social Power 


High M| Low M 


Total influence at-| 24.34 | 16.00 | 3.18** 
tempts 
Freq. successful in-| 13.87 | 7.96 | 3.16** 
fluence attempts 
Freq. unsuccessful] 10.48 | 8.04 | 1.98* 
influence at- 
tempts 
Freq. demanding | 5.33 | 3.46 | 2.10* 
influence at- 
tempts 
Freq. suggestions | 18.90 | 12.30 | 3.13** 
Total valuing of | 2.48 | 1.73 | 2.03* 
others’ behavior 
N 46 84 
*p= 05. 
. p= 001. 
TABLE 3 


Grour Brexavior or Boys wirs Low 
Socran POWER VARYING 
IN INTELLIGENCE 


Intelligence 


High M| Low M 


Total influence at-| 21.63 | 16.00 | 2.02* 
tempts 
Freq. successful in-| 11.97 | 7.96 | 2.06* 
fluence attempts 
Freq. suggestions} 16.86 | 12.30 | 2.04* 
Freq. positively | 1.37 .89 | 2.58** 
valuing others’ 
behavior 
N 38 84 
*p = 05. 
p= 0l. 


and demands. Girls did not display any 
type of behavior significantly more often 
than boys. It is evident that boys were 
considerably more active and demanding 
in their groups than were girls. 
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TABLE 4 E 
BEHAVIOR OF HIGHLY INTELLIGENT Boys Versus GIRLS VARYING IN Soca POWER by 
Social Power Social Power 4 
Low Low High High 7 
Boys Girls Boys Girls 
M M t M M t r 
& 
Total influence attempts| 21.63 13.57 2.30* 25.55 16.69 3.08*** 
Freq. successful influ- | 11.97 7.21 | 1.80* 15.76 9.79 2.62"** 
ence attempts 
Freq. unsuccessful in- 9.66 6.36 2.39** 9.78 6.90 1:79“ 
fluence attempts 
Freq. suggestions 6.87 10.04 3.13*** 19.95 13.80 2.75°"" 
Total valuing of others’ 2.16 1.67 97 2.29 2.09 48 
behavior 
Freq. demanding influ- 4.63 3.14 28 5.17 2.61 2.649** 
ence attempts 
Freq. aggressive behav- 2.97 1.50 1.89* 2.79 1.54 g.a 
10r 
Total affect-laden re-| 1.95 2.02 15 2.46 1.63 3.92*** 
marks 
M weighted demanding- | 21.15 20.95 31 22.64 21.25 1.50 
ness in behav. d 
N 38 42 58 73 í 
*p = .05. : 
“p= 01. 
e. p = 001. 


When members of both sexes were high 
in intelligence but low in power, boys were 
more active and aggressive than were girls. 
These data are shown in Table 4. 

Where members of both sexes were 
high in both intelligence and power, the 
patterns of behavior were similar to those 
just described. In addition, high-high boys 
were demanding in their comments and 
used more affect-laden types of behavior 
than high-high girls (see Table 4). 

In contrast to the girls, then, highly 
intelligent boys were likely to be active 
in their groups regardless of their social 
power and likely to be aggressively in- 
sistent in stating their opinions when high 
in both power and intelligence. 

Among the less intelligent children the 
boys again appeared to be more involved 
in their groups than the girls if they were 
high in power. A comparison of the be- 


i 
havior of boys and girls is presented ir 


Table 5. Boys with low intelligence vat 
low power were very little different fror 
girls with low intelligence and power. 


TEACHERS’ Perceptions or Boys 


We turn to an examination of i £ 


qualities teachers attributed to these ¢” 
dren. 


Effects of Variations in Power 


Among highly intelligent boys, the 
teachers made clear distinctions on €Y? E 
category between boys who were high 1 
power and those who were low in powe™ 
These results are shown in Table 6. 

Among less intelligent boys, the teachet? 
made similar distinctions in the charas 
teristics they attributed to boys high } d 
power and those low in power. These ee 


_ Strikingly 
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TABLE 5 
BEHAVIOR or Less INTELLIGENT Boys Versus GIRLS as RELATED To SoctaL POWER 
5 —+ 
Social Power Social Power 
High High Low Low 
Boys Girls Boys Girls 
M M t M M t 
Total influence attempts| 24.34 15.96 2.44** 16.00 13.49 1.09 
“req. successful influ- | 13.87 8.81 1.99* 7.96 5.61 1.59 
Ence attempts 
req. unsuccessful influ-| 10.48 7.15 1.90* 8.04 7.88 .12 
ence attempts 
‘eq. suggestions s.s9 | 13.23 | 2.07* | 12.31 | 9.75 84 
req. Positively valuing] 1.19 61 2.04* .89 -58 1.63* 
others’ behavior 
otal valuing others’ 2.48 1.54 1.96* 1.76 1.41 -99 
behavior 
: Weighted demand- | 21.98 | 19.70 .30 21.95 | 20.09 | 2.86*** 
Ingness 
Teq. aggressive be-| 3.24 | 2.00 | 1.40 3.13 2.20 | 1.50 
havior 
N 46 26 84 51 
“P= 05. 
v? = 01. 
“P= 001 


te are listed in Table 6. In the eyes of 

Sr ARTS boys with greater power are 

ron} different in their social behavior 
hose with less power. 


E ‘ 
Hects of Variations in Intelligence 


nig chers made no distinctions to a sig- 
telire degree between boys high in in- 
na oad and those low in intelligence, re- 
ess of their social power. 
. Tracing Perceptions OF GIRLS 
Efe 
cts of Variation in Power 


Ar ‘ (coe ; 
with pee girls high in intelligence, those 


dliffere e social power were seen to be 
$ nt fr s ar 
n Table n girls low in power as shown 


with pa & girls low in intelligence those 
gh social power were perceived by 
“ts to be different from those with 


Soci : 
cial power in only two respects. 


teach 
ow 


Girls with higher power were seen as more 
often successful in influencing others (M 
4.82 and 3.77, p of diff. = .005) and more 
friendly (M 4.30 and 3.46, p of diff. = 
005). 


Effects of Variations in Intelligence 


Among girls with greater social power, 
teachers saw girls with high intelligence 
as being little different in their classrooms 
from those with low intelligence: the girls 
were viewed as making more attempts 
to influence classmates (M 4.26 and 3.80, 
p of diff. = .02) and as more successful 
in these attempts (M 3.77 and 3.19, p of 
diff. = .005). 

Among girls with little social power, 
highly intelligent girls compared to less 
intelligent ones were seen by teachers as 
making more frequent attempts to exercise 
influence in the class (M 4.29 and 3.76, p 
of diff. = .02), as more successful in these 
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TABLE 6 
CHARACTERISTICS TEACHERS ATTRIBUTE TO BOYS WITH 
DIFFERENT ,DEGREES oF SocIAL POWER 
Boys high in intelligence Boys low in intelligence 
Social Power Social Power 
High Low High Low 
M M t M M t 
Freq. influence attempts| 4.05 3.70 1.50 4.10 3.52 2.40** 
Amt. success as influ- | 4.10 3.24 3.69*** 4.46 3.50 4 38t 
encer 
Friendliness 3.95 3.41 2.43** 4.27 3.41 a.Go°"* 
Depends on teacher 3.57 4.62 3.84*** 3.76 4.32 3.08%" 
Depends on peers 3.68 4.45 3.00*** 3.56 4.04 2.23* 
Self-centeredness 3.58 4.26 2:903%**+ 4.03 3.68 1.60 
Degree of forcefulness 3.68 4.44 3.00*** 3.56 4.04 9.23" 
N 58 38 46 84 
“p= .05. 
* p= 01. 
. p= 001. 
efforts (M 4.82 and 3.78, p of diff. = .005), TABLE 7 


and as more friendly (M 4.29 and 3.62, p 
of diff. = .005). 


TEACHERS’ Perceptions or Boys 
AND GIRLS COMPARED 


The perceptions teachers had of boys 
and girls differed to a significant degree 
only where the members of both sexes 
were high in power while low in intelli- 
gence. Here, boys were seen as more active 
in making attempts to influence class- 
mates and more self-centered in doing so 
than were girls. 


DIFFERENCES DUE TO AGES 
OF SUBJECTS 


The results on all dimensions for second 
graders were compared with those of fifth 
graders. There was no acceptable evidence 
that differences in ratings of Ss by peers 
or teachers, or in their observed behavior 
in the small groups, were associated with 
differences in age. 


CHARACTERISTICS TEACHERS ATTRIBUTE TO 
HIGHLY INTELLIGENT GIRLS WITH 
DIFFERENT DEGREES or SOCIAL 


POWER 
Social Power 
t 
High | Low 
M M 
Amt. success as in- | 3.78 | 3.19 3.89** 
fluencer a 
Friendliness 3.62 | 3.07 | 2.75% 
Depends on teacher | 4.00 | 4.38 1,807, 
Self-centeredness 3.93 | 4.63 | 3.16 
N 73 | 42 
*p = 05. 
** p = 001. 


Summary anD Discussion or ResuLT 


Characteristics Attributed by Peers 


Boys and girls were similar in that tho®° 
with greater power, compared to those 
with less, were better liked by their pee!* 


+ 
e 
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Attractiveness then was an important 
basis of power for both sexes. 

Boys and girls were different in that 
boys with greater power were seen by 
classmates as more threatening, and girls 
With greater power were rated as more 
expert “in the things you do in school.” 
aoe results suggest that among these 
aie the two sexes won social power 
en on the basis of different at- 
he es, boys earned it by being threat- 
in g and girls earned it by being skilled 

the things required of a school child. 


Observed Behavior in Groups 


a behavior of boys may be briefly 
ae ne o by noting that those who were 
Gace: sosial power and low in intelligence 
oys ows) were strikingly different from 
iter all other combinations of in- 
fotGhines and power. The significant dif- 
oys aa we have seen in the behaviors of 
Ow-loy em primarily from the fact that the 
groups, Were passive persons in- their 
either ~ More than boys who were high in 
cin or intelligence. The low-lows 
Others arily less often tried to influence 
Were Je Were less successful in doing 80, 
evaluatin demanding in manner, less often 
tions pig in respect to others’ contribu- 
Proposal less often suggestors of tentative 
Powe: S. Boy s with various combinations 
ow a and intelligence other than low- 
Other E nmo way different from one an- 
in their M9 Statistically significant degree 
highs wy Observed behavior, that is, high- 
r from 1 not different from high-lows 
that p; aa highs. It is worth special note 
Dower E ly intelligent boys whose social 
Ent fp vas low behaved in no way differ- 
high, om those whose social power was 


bie Cause the low-lows were different from 
Ways E more power or intelligence in 
Cause a Were quite comparable, and be- 
fere fa high-highs were in no way dif- 

Tom the high-lows or low-highs; 
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it appears that the possession of greater 
intelligence may be the same as the pos- 
session,of greater power insofar as the 
effect upon behavior in these groups is 
concerned. 

This similarity is due, we believe, to 
the fact that the contributions made by 
highly intelligent boys in a problem solving 
discussion represent resources which are 
valuable to the group. The possession of 
these resources, we assume, provided 
power for the owners to influence those 
who valued them. It is highly probable, 
although it cannot be demonstrated with 
the data available here, that boys with 
high intelligence were treated by group 
members as though they were persons with 
high power. As a consequence, boys with 
high intelligence became aware of the value 
of their ideas and of the influence they 
were having on the discussion. Although 
they had little power accorded to them 
previously by their classmates, boys with 
high intelligence apparently behaved in 
the groups like those who had come to 
the group with social power already at- 
tributed to them by their peers. 

It is noteworthy that boys low in in- 
telligence yet high in social power tended 
to be more inconsistent than low-lows in 
that they made both demands and weak 
proposals, and praised as well as criticized 
others, whereas boys high in intelligence 
but low in power more consistently pro- 
posed ideas and supported others’ be- 
havior than did the low-lows. The pos- 
sibility is suggested by these findings that 
greater power among boys generated more 
inconsistent and coercive group behavior 
while greater intelligence evoked a con- 
sistency and readiness to be considerate 
in relations with others. This conjecture 
is supported by the findings that peers 
characterized boys with greater power as 
threatening but did not so describe boys 
with greater intelligence. 

Girls who were low-lows were less suc- 
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cessful in influencing their groups and less 
often made positive remarks than girls 
high in power but low in intelligence. This 
indicates that low-low girls, like the low- 
low boys, were passive in their groups’ 
discussions. Low-low girls were not dif- 
ferent in any respect from girls low in 
power and high in intelligence, which sug- 
gests that high intelligence was not simi- 
lar to power among girls, as was earlier 
noted for boys. 

In most combinations of intelligence 
and power, boys were significantly more 
active in seeking to influence others, more 
often successful, more often unsuccessful, 
and more likely to evaluate the comments 
made by others than were girls. Low-low 
boys were very much like girls and most 
like low-low girls, differing from them in 
only two respects: they were significantly 
more likely to be demanding and to be 
positive in commenting upon the con- 
tributions made by others. 

In sum, the possession of either power 
or intelligence by boys appeared to stim- 
ulate vigorous and successful participa- 
tion in their groups’ work while the pos- 
session of low power and low intelligence 
together generated passivity in boys. The 
amount of intelligence or power a young 
person possesses affected a boy’s behavior 
more than it did a girl’s behavior in these 
problem solving discussions. 

To explain the consequences of power 
and intelligence for boys and girls we as- 
sume that these two attributes have a 
differential significance for the sexes in 
allowing them to conform to the expecta- 
tions society puts upon them. Barry, 
Bacon, and Child (1) have reported that 
boys are expected to be self-reliant and 
to strive for achievements, while girls are 
urged to be nurturant, obedient, and re- 
sponsible in almost all societies, including 
ours. 

We assume that, in their group be- 
havior, the boys and girls were attempting 
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to conform to these prescriptions for their 
sexes. Boys who were high in either in- 
telligence or social power more clearly ful- 
filled their sex roles than those lacking 
in both of these. Boys who were low in 
both intelligence and power least often 
showed the behavior required of their sex. 
The low degree of their intelligence and 
power apparently made them unable to 
perform in ways typically expected of 
them. Thus, either social power or in- 
telligence was necessary for boys if they 
were to act as boys are expected to act. 
There is some indication that social power 
was more important than intelligence in 
this respect. 

Girls who were high in power and in- 
telligence were little different from those 
who were low in either of these qualities 
because, we believe, high social power and 
intelligence were not needed in order to be 
the nurturant, obedient, or responsible 
persons required by society. Girls could 
fulfill these expectations regardless of the 
amount of power or intelligence they 
possessed. 


Perceptions by Teachers 

The children who were accorded higher 
social power by their classmates were? 
viewed by teachers as most influential, 
since the teachers rated boys and girls 
with higher power as more successful ie 
influencing others than those low in soc! 
power. i 

On almost every category, teachers dis- 
tinguished between boys low in power 326, 
those high in power, regardless of the boy 
intellectual abilities. It is striking in t'§ 
connection that teachers saw those with 
greater power as less forceful and mor? 
friendly than those with less pow?! 
whereas boys with greater power (and ow 
intelligence) were demanding and lee 
friendly than those with less power wher 
observed in the small groups. Cleat 
teachers did not perceive the boys Wit” 


ay 


ae | | 
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greater social power as threatening persons 
in the way that classmates saw them. 

In their considerations of girls, appar- 
ently those who were high in either intelli- 
gence or power were seen as more effective 
members of their classes than those low 
m these respects. 

Why did the teachers characterize boys 
who had greater power (but low intelli- 
Sence) as more considerate of others than 
Was observed among them in the small 
groups? The most likely explanation is 
that their behavior actually was different 
in their classrooms from what it was in 
the discussions. In a class the teacher is 
he charge so that nondemanding, friendly 

ehavior is required and becomes the 
Standard way for interacting with peers 
ss school. It is also probable that 
Pa P approved of achievement ori- 
ee and influential efforts when they 
in used by boys since such behavior is 
_| accord with the demands that teachers, 
on ng other adults, place upon young 
F Thus, teachers attributed positive 
vie acteristics to the boys whom they 
ewed as most influential in their class- 
Tooms, 
Ps the small groups no one was in 
ecg Hence, the boys with greater 
ie but less intelligence, were free to 
and ee having their opinions accepted 
Sary Vere free to use coercion when neces- 
m order to be influential. 
v a did not perceive differences in 
Were Hs behavior between boys who 
Were acai intelligence and those who 
Shoes 8 in intelligence although differ- 
lent ; etween these two groups were evi- 
in the problem solving discussions. 

*Ppears that variations in intelligence 
not generate variations in social be- 
de in a classroom, of the kind that 

ders were asked to report. 
in PP arin that girls were less active 

the classes and therefore less visible 
° teachers than the boys. Thus, 


hay 
tea 
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teachers made fewer distinctions in char- 
acterizing the behavior among girls with 
different amounts of power than they did 
among boys. Girls with higher power were 
seen by teachers as most friendly to class- 
mates, which suggests that through this 
friendliness they won influence in the 
classroom. Girls with higher intelligence 
were also seen by teachers as being more 
friendly and as making more attempts to 
influence others than girls with low in- 
telligence. It is probable that the girls 
with high intelligence did use acceptable 
forms of social influence in their school- 
room relations since they were rated as 
attractive by their classmates, more so 
than those with low intelligence. This ac- 
ceptance by their peers apparently gave 
them confidence to exercise their influence 
freely and teachers noted this in charac- 
terizing highly intelligent girls. 

In summary, teachers perceived the be- 
havior of girls as pretty much alike regard- 
less of their power or intelligence. They 
made no significant distinctions among 
boys with different degrees of intelligence, 
but they saw many distinctions in the 
behavior of boys who differed in social 
power. Assuming that the teachers’ per- 
ceptions are accurate, it is evident that a 
boy’s social power determines his be- 
havior in the classroom more than his 
intelligence does, whereas differences in 
the power or intelligence of a girl has 
little effect upon the behavior she employs 
in the schoolroom. 
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The present article considers the idea 
that college cultures may be seen as a 
complex of environmental press which, in 
turn, may be related to a corresponding 
complex of personal needs. In the psycho- 
logical literature, one is indebted to Henry 
Murray (5) for the dual concept of per- 
Sonal needs and environmental press. In 
the broadest sense, the term “need” refers 
to denotable characteristics of individuals, 
including drives, motives, goals, ete. The 
term “press” can similarly be regarded as 
& general label for stimulus, treatment, or 
Process variables, Murray’s concept of 
needs has provided a starting point for 
the Construction of various objective 
Measures of personality (2, 3, 4). No 
Parallel development in the objective 
Measurement of environmental press, how- 
has has previously been attempted. Col- 

ge Students differ. College environments 
also differ, The concept of press offers a 
Way of viewing the environment which is 
to erable analytically and synthetically 
th the More familiar ways of dealing with 

© individual. 


DEVELOPMENT oF THE COLLEGE 
CHARACTERISTICS Inppx 


ag BB Murray’s classification of needs 
ex 5 model, Stern has constructed several 
operimental editions of a needs inven- 
Paa called the Activities Index. In its 
Fists cs form the Activities Index con- 
Social 300 statements of commonplace, 
spo y acceptable activities to which re- 
aia ae of “like-dislike” are given. There 

Scales of 10 items each, correspond- 


2 . 
nay, tions of this research have been 
ced by grants from the College En- 


tray : n 
Red Examination Board and the Carnegie 
Poration, 


ing to 30 needs in Murray’s taxonomy. 
Some scales can be scored positively or 
negatively, as for example conjunctivity- 
disjunctivity, succorance-autonomy, im- 
pulsion-deliberation, etc., so that the total 
number of needs to which scores can be 
attached is 42 rather than 30. A pre- 
liminary manual for the Activities Index 
describes the test in detail (8). 

A corresponding test for describing col- 
lege environments, called the College 
Characteristics Index, was subsequently 
constructed. It consists of 300 statements 
about college environments to which re- 
sponses of “True-False” are given. The 
statements are organized into 30 ten-item 
scales, with a press scale for each need 
scale that was included in the Activities 
Index. The following kind of questions 
guided the writing of items: what might 
be characteristic of an environment which 
exerted a press toward order, or toward 
autonomy, or toward nurturance, or un- 
derstanding, or play, etc? Stated in an- 
other way, what might there be in a 
college environment which would be satis- 
fying to or tend to reinforce or reward 
an individual who had a high need for 
order, or autonomy, or nurturance, or 
understanding, or play, etc.? The items 
themselves are statements about college 
life. They refer to the curriculum, to col- 
lege teaching and classroom activities, to 
rules and regulations and policies, to 
student organizations and activities and 
interests, to features of the campus, etc. 

Sample items from corresponding Need 
and Press scales will illustrate the par- 
allelism. 

A need for Order would be inferred from 
liking such activities as: “Arranging my 
clothes neatly before going to bed. Hay- 
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ing a special place for everything and 
seeing that each thing is in its place. 
Keeping a calendar or notebook of the 
things I have done or plan to do.” What 
might such a person like to find in a col- 
lege environment or what features of a 
college environment might be rewarding 
or frustrating to such a need? The fol- 
lowing items from the press scale for 
Order might be relevant: “Faculty mem- 
bers and administration have definite and 
clearly posted office hours. In many classes 
students have an assigned seat. Profes- 
sors usually take attendance in class.” 
On the need scale for Impulsion-De- 
liberation, a high score for Impulsion re- 
sults from liking such activities as: “Being 
in a situation that requires quick decisions 
and actions. Doing things on the spur 
of the moment. Doing whatever I’m in the 
mood to do.” Thus, a college environment 
which has a press toward impulsiveness 
might be a place where: “Most students 
don’t decide what courses to take until 
the time of registration. Students often 
start projects without trying to decide in 
advance how they will develop or where 
they may end. Spontaneous student ral- 
lies and demonstrations occur frequently.” 
A high need for Energy is inferred from 
liking such activities as: “Taking up a 
very active sport. Having something to 
do every minute of the day. Giving all 
of my energy to whatever I happen to be 
doing.” The needs of such a person might 
be expected to find fulfillment and satis- 
faction in a college environment where: 
“There is an extensive program of intra- 
mural sports and informal athletic ac- 
tivities. Student gathering places are typi- 
cally active and noisy. Class discussions 
are typically vigorous and intense.” 
Just as needs are inferred from the 
characteristic modes of response of an in- 
dividual, so press are reflected in the 
characteristic pressures, stresses, rewards, 
conformity-demanding influences of the 
college culture. Operationally, press are 
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the characteristic demands or features as 
perceived by those who live in the par- 
ticular environment. To each statement in 
the College Characteristics Index the per- 
son who takes the test answers true if he 
believes it is generally characteristic of 
the college, is something which occurs or 
might occur, is the way people tend to 
feel or act; and he answers false if he 
believes it is not characterisic of the col- 
lege, is something which is not likely to 
occur, is not the way people typically 
feel or act. 


A Srupy or Five Insrrrurions 


A first draft of the College Characteris- 
ties Index was administered in May, 1957, 
to groups of students at five institutions 
and to smaller groups of faculty members 
at four of the five institutions. In all, 423 
students and 71 faculty members re- 
sponded to the instrument. Neither stu- 
dent nor faculty groups were representa- 
tive samples. Most of the students were 
upperclassmen and most of the faculty 
members were in the upper academie 
ranks. It was argued that if a dominant 
press really exists in a particular environ- 
ment almost any group of people living 
in the environment would probably 
identify it. The testing program was, ™ 
any case, intended only as a preliminary 
try-out of the model from which somè 
information would be gained about the 
types of items, the possible reliability 
and validity of the scales, and the poten- 
tial utility of this approach to measuring 
college characteristics. 

The five institutions, although not ide?” 
tified here, were selected because observe" 
would probably agree that they are rathe? 
different from one another, with the selet- 
tion of colleges thus providing some eV” 
dence about the construct validity of tb? 
test. One was a large Midwestern state 
university. The second was a large Mi ° 
western private university, The third 1” 
stitution was a large Eastern private u?!” 


| y 
aa a a 
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versity. The fourth was a moderate-sized 
Eastern private college for men. The fifth 
Institution was a publicly supported col- 
lege in the metropolitan New York area. 


Test Results 


Saying that a particular press is or is 
not characteristic of an institution is an 
arbitrary matter. There exist no conven- 
tions or experience to guide the decision. 
The basis for the tentative decisions that 
Were made, together with other statisti- 
cal information about the scales and the 
items are given in the following para- 
graphs. 
aS examining means, or a profile on 
h ily a mean scores are plotted, one natu- 
a looks for what appear to be high 
aa py points. And in examining variances 
ide eames in the context of press 
es cation, one naturally looks for 
van concentration of scores or a low 
regent Suggesting a consistency of im- 
on” rather than a wide dispersal of 

eS, suggesting divergent impressions 

about the press. 
Paves 1 shows the means and standard 
on ions on each of the press scales, 
the poteg from the students’ responses at 
30 N institutions studied. Each of these 
0 = es has a maximum possible range of 
ie a 0. The median of these mean scores 
a ag 5.5. The median of the 
Satie deviations is approximately 1.7. 
mi ie this information together, one 
na Suggest that, for the five institu- 
hitig represented, a fairly reasonable defi- 
ine of a noticeable press (or its ab- 
te Would be one which required a mean 
Rat falling in the upper or lower one 
Bona, of the total distribution. Mean 
of ee of 6.6 or higher and mean scores 
of n. or lower would thus be suggestive 
ik press, In Table 1, means which meet 

~ criterion are in italics. 

enn sponding table showing faculty 
m ‘and sigmas is not presented, mainly 
‘use the number of cases is quite 
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small—Ns being 25, 20, 11, and 15. Also, 
from an educational view, it may be ar- 
gued that the effective press of an environ- 
ment is what the students say it is, not 
what the faculty say, or what the cata- 
logue says, etc. Nevertheless, in the four 
colleges where both student and faculty 
responses to the CCI were obtained it is 
of some pertinence to note their similar- 
ity. Table 2 shows a distribution of the 
differences between the means of stu- 
dents’ and faculty responses in each in- 
stitution and also the differences between 
faculty and student responses to indi- 
vidual items. Here it is evident that most 
of the differences are fairly small. In 
nearly half of all the comparisons between 
faculty and student mean scores the dif- 
ference between the two was less than 
half a point. Three fourths of all the dif- 
ferences were less than 1.00. Differences 
between faculty and student responses to 
individual items are grouped in three ar- 
bitrary categories. Thus, 44% of all the 
items answered by students and faculty 
were answered within a range of agree- 
ment of 10 percentage points or less; and 
on 12% of the items the percentage for 
the students’ answer differed by 30 points 
or more from the percentage for the 
faculty’s answer. Considering the number 
of cases in all groups, one can be confident 
that differences of 10 points or less are 
in all instances merely chance differences, 
and that differences of 30 points or 
greater may in most instances be signifi- 
cant differences at least at the 5% level 
of confidence. It appears then, that some 
of the differences in the middle category 
of the table are chance differences and 
some are significant differences. One might 
estimate that, over all, about three fourths 
of the items were answered in tolerably 
good agreement by both students and 
faculty, and that perhaps as many as one 
fourth of the item responses represent 
divergent views between students and 
faculty in characterizing the institution, 
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TABLE 1 


MEANS AND STANDARD DEVIATIONS ON THE VARIOUS PRESS SCALES FROM 
STUDENTS’ RẸSPONSES AT Five Institutions 


Colg A College B College C | College D College E 

SIE (N = 100) | W = 44) |W = 100)| (N = 68) | W = 111) 
i |sp| m |sD| z |s| m|sp| mM | SD 

Abasement 

Achievement 

Adaptiveness 

Affiliation 

Agression-Blameavoidance 

Change-Sameness 


Conjunctivity-Disjunctivity 
Counteraction 
Deference 

Dominance 
Ego-Achievement* 
Emotionality-Placidity 
Energy-Passivity 
Exhibitionism 
Fantasied Achievement 
Harmavoidance 
Humanism? 
Impulsion-Deliberation 
Narcissism 

Nurturance 
Objectivity-Projectivity 
Order 

Play 

Pragmatism? 
Reflectiveness® 
Scientism‘ 

Sentience 

Sex 
Succorance-Autonomy 
Understanding 


oo 00 WW NAR ORDAHAUNORHWOMAHRARPRWOS 


PEP RPE PEE Eee Ee Ee eee eee 
oto ODOAANOAMMOUNHDIOSOHMONIHROWODOWON 


Go He He D> 00 Co m co So CO ET CO COR AA OD O O AAU AA 


Pe eee eee 


DOORS h D to h do to to à do in do O o a a H an do h o a o w oD 
DRAMA ho nia ow aw aoo oa N a aa ie i co i i o i i N 
AAA APE AALBRGH ALANA MAPA ARAMA DH 
iy OOH OR RW HORN DR WAODROHRHAHOHG 


BRE EHP RPE 


þei pei p i NO A e a pi j ped pei pe ool ell ell oll ol a el a el ped jed ped pet pet pt p pat 
Line DOD AREONAUNDOWOROAIINNANNAUAOS 
HON ANNAN REWSDOAWOOHONRPHEDHNISWED 


BEEN NEE BBP EPP RP RHEE REE EEE 
SPOWOSCOMARONUNWDWNAMRBOWDAHAKPKRARPUMON 


AAR AAAYAVBAAAGAAAAE AEH AAA Hw HAA 
whana w o N iaa t oD oN a oo w a t a to io o a o w o o 
DWAWMNDAOWUENUROMUADUIBDWHAROHOOBHE 


PRB BEEP HEP EN BBP HPN EHP HEN HEHE EPP NEED 


® Derived from Exocathection-Intraception. 
b Derived from Ego Ideal. 


© Derived from Endocathection-Extraception: Social Sciences and Humanities. 


d Derived from Exocathection-Extraception. 


© Derived from Endocathection-Intraception. 


Í Derived from Endocathection-Extraception: Natural Sciences. 


Reliability 


Test-retest data are not available in the 
present study. Conventional estimates of 
reliability from a single administration 
may not be appropriate for an instrument 
on which one hopes to find skewed dis- 
tributions and minimal dispersion of 
scores. Faculty-student agreement within 
the same institution is of some relevance 


but one might argue that the perce! 


pions 


of these two groups could differ on ™ 


vidual items or scales, yet each coul 


reliable within itself. 


Item analysis data are more 
relevant to the reliability of scales- 
item analysis was made of each s¢# 


d be 


directl¥ 


Je, 


separately for each of the five institutio™™ 
using the students’ responses. Ebel’ 
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simplified method was used. Each of the 
30 scales was thus subjected to item 
analysis in five different samples, and since 
there are 10 items in each scale, the total 
number of item discrimination indexes 
obtained was 1500. Of these 1500 dis- 
crimination indexes, 1% were negative, 
18% fell between 0 and .19, 30% fell 
between .20 and .39, and 51% were 40 
or higher. In other words, 81% of the 
items had, on the average, moderate to 
high discrimination in their respective 
Scales. 

_ Perhaps the most important approach 
18 one which treats reliability and validity 
as inseparable and deals with the in- 
strument as a whole. For example, do 
different people characterize the institu- 
tion in the same way? This involves the 
reliability of profiles, with all their inter- 
relationships. As a first approximation of 
this, the rank order of mean scores from 
the students’ responses can be compared 
With the rank order of mean scores from 
faculty responses within the same institu- 
3 Thus, do these groups see the insti- 
ution in relatively the same pattern? For 
the two colleges (Colleges B and C) which 
had the largest number of faculty re- 
Spondents, these rank order correlations 
Were .96 and .88. Correlations were not 
a ag for the other two colleges where 
rey mean scores would be based on 

S of 15 and 11. 


Validity 


ae illustrate the sort of interpretation 
wht description of a college environment 

ich can be derived from the CCI, the 
Pst of two colleges are presented. The 
fe ments are based entirely on the ar- 
fark ary definitions of what levels of scores 
ee a press (which was explained 
tice ; and the nature of these press 18 
Ae es illustrated by citing some of the 
Peeifle items which most clearly define 
ea j No estimate is presently available 

he validity of these descriptions against 
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TABLE 2 
SUMMARY OF DIFFERENCES BETWEEN 
FAĢULTY AND STUDENT RESPONSES 
wITHIN Each oF Four 


INSTITUTIONS 
Number | Cumula- 
Mean Differences in Scale Scores of tive 
Differences} Per Cent 
.0 to +.49 57 48% 
+.50 to +.99 34 76% 
+£1.00 to +1.49 23 95% 
+£1.50 to +1.99 6 100% 
120 
Percentage Differences in 
Responses to Individual Items 
10 percentage points or 528 44% 
less 
Between 11 and 29 points 528 88% 
30 points or more 144 100% 
1200 


a systematic outside criterion. But the 
descriptions show quite clearly that these 
are very different environments, and that 
the test is therefore capable of revealing 
some sharp distinctions between colleges 
which qualified observers would expect to 
be different. The evidence is therefore 
relevant to the property of validity. De- 
tailed data, and descriptions of the other 
three colleges, may be found elsewhere 


(7). 
College A. 


The major press in College A are toward 
orderliness and friendly helpfulness, with 
overtones of spirited social activity. This 
is suggested by high scores on the scales 
for Order, Objectivity, Conjunctivity, Nur- 
turance, Play, Ego Achievement, Exhibi- 
tionism, and by low scores on the scales 
for Abasement, Impulsion, and Aggression. 

The stress on Order, Deliberation (op- 
posite of Impulsion), and Conjunctivity 
is indicated by such highly shared ob- 
servations as the following: students have 
assigned seats in some classes, professors 
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often take attendance, papers and reports 
must be neat, buildings are clearly marked, 
students plan their programs with an ad- 
viser and select their courses before regis- 
tration, courses proceed systematically, it 
is easy to take clear notes, student ac- 
tivities are organized and planned ahead. 

Within this orderliness, student life is 
spirited and a center of interest. For ex- 
ample, big college events draw lots of 
enthusiasm, parties are colorful and lively, 
there is lots to do besides going to classes 
and studying, students spend a lot of time 
in snack bars and in one another’s rooms, 
and when students run a project everyone 
knows about it. 

At the same time, amid this student- 
centered culture, there is a stress on 
idealism and service. Students are expected 
to develop an awareness of their role in 
social and political life, be effective citi- 
zens, understand the problems of less 
privileged people, be interested in chari- 
ties, etc. 

The total picture of the environment, 
then, is one of high social activity, esprit 
de corps, and enthusiasm combined with 
an emphasis on helping others and idealis- 
tic social action and all within a fairly 
well understood set of rules and expecta- 
tions which are deliberative and orderly. 
One would expect some of the explicit 
objectives of such an institution to stress 
personal and social development, idealism 
and social action, and civic responsibility, 


College B. 


Here the dominant press of the environ- 
ment fall in the theoretical-intellectual 
category—Reflectiveness, Humanism, Sci- 
entism, Understanding, and Objectivity. 
This dominant press occurs in an environ- 
ment also characterized by Change, non- 
defensive acceptance of criticism (Adap- 
tiveness), and by resistance to any abject 
acceptance of criticism or presumed low 
status (Abasement). Moreover, on two of 
the scales which defined a high press at 
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College A (Play and Order), the press at 
B is exactly opposite. There is further, 2 
minimum of importance to social status 
or manipulation for tangible ends (Prag- 
matism), preoccupation with self and per- 
sonal appearance (Narcissism), and boss- 
ing or directing others (Dominance). 
There is, however, a generally consistent 
and high press toward Deliberation oF 
planning and thinking ahead. 

It is clear that the most pervasive press 
is directed toward the pursuit of under- 
standing for its own sake, abstract and 
unencumbered by requirements for prac- 
tical utility or social action. 

The theoretical-intellectual press of the 
environment at College B is more spè- 
cifically suggested by the following ob- 
servations with which, generally, more 
than nine tenths of both faculty and stu- 
dents agree: there are excellent library 
resources in natural science and social 
science, a lecture by an outstanding phi- 
losopher or scientist would draw a capat- 
ity audience, many students are planning 
graduate work or careers in science o 
social science, there are many opportuni- 
ties for students to see and hear and criti- 
cize modern art and music, reasoning a” 
logic are valued highly in student reports 
and discussions, students who spend & lot 
of time in a science laboratory or in try- 
ing to analyze or classify art or musie gi 
in seeking to develop a personal syste™ 
of values are not regarded as odd, scholar- 
ship and intellectual skills are regarde 
as more important than social poise 2” 
adjustment, there is time for private 
thought and reflection, one need not 
afraid of expressing extreme views, the 
faculty and administration are tolerant 
and understanding in interpreting regul 
tions. 

In contrast with College A, students at 
B do not have an assigned seat in clas 
professors do not take attendance, students 
are likely to study over the weeke?’ 
big college events draw no great enthu® 
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asm, and the place is not described as 
one where “everyone has a lot of fun.” 

Moreover, student leaders have no 
Special privileges, family status is not im- 
Portant, students are not much concerned 
about personal appearance and grooming, 
and an intellectual is not an “egghead.” 

And finally, exams are not based on 
factual material from a textbook, classes 
are not characterized by recitation and 
drills, grade lists are not publicly posted, 
Students are not publicly reprimanded for 
mistakes, student organizations are not 
Soe supervised, students tend to stay 
ha late at night, work all the harder if 
es Corea received a low grade, and if 
like — with a regulation they do not 

hey will try to get it changed. 
Fa would expect the explicit objectives 
a an institution to stress the acqui- 
iid n of knowledge and theory, critical 
of : a and independence, and a sense 
Ne significance of intellectual life. 


Other Differences Between Colleges 


Pa College A and College B are quite 
setae environments can be suggested 
Profi] ically as well as in the descriptive 
tian s just presented. The rank order of 
exam ae for College A correlates, for 
Siren e, .06 with the rank order of mean 
Mate, for College B. One can also esti- 
of the Without computing, the significance 
of Conn erences between the mean scores 
30. 50 fini A and College B on each of the 
stants Given Ms of 100 and 44, and 

itera deviations as large as 2.00, a 
the ror of 1.00 is significant well beyond 

the level of confidence. On at least 22 

Cans R 0 scales, the differences between 

Onsid, or the two schools are significant. 
o; ering all five schools, it is clear that 
thers F scale except one (Deference) 
two Seas significant differences between 

nke of the mean scores. 

b ained f indication of differences can be 
tween 4 from noting the differences be- 
© percentages of students at Col- 
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leges A and B answering each item ac- 
cording to the key. In the middle range 
of percentages a difference of 20 points, 
with Ns of 100 and 44, is always signifi- 
cant beyond the 1% level. On this con- 
servative basis there were 172 of the 300 
items on which the percentages for Col- 
lege A differed significantly from College 
B. 

The point to these observations is 
merely to suggest that the first trial run 
of the CCI produced many results which 
clearly differentiated among the environ- 
ments or press of the five colleges. The 
actual items in the CCI are not repro- 
duced here because there seems little 
virtue in printing a first draft. The con- 
tent of many items has been, of course, 
indicated in the descriptive profiles of the 
two colleges and in the earlier paragraphs 
which noted the parallel structure of 
Need items and Press items. 


CONCLUSIONS AND IMPLICATIONS 


After completing the preliminary stud- 
ies reported thus far, a revised form of the 
College Characteristics Index was pro- 
duced in which 58% of the original items 
were retained, 13% were slightly modified, 
and 29% of the items were new. The re- 
vised form, at the time of writing this 
article, has been administered to approxi- 
mately 1200 students distributed among 
more than 30 institutions. Further re- 
search and more intensive analyses have 
been planned. Before commenting on spe- 
cific plans, however, certain broad values 
and implications in this psychological ap- 
proach to the measurement of college 
environments are suggested. 

One potential value, for example, is in 
institutional self-analysis. Administrators 
and faculty members should be able to 
learn something useful about the dynamics 
of the college environment from studying 
students’ responses to the College Charac- 
teristics Index. Institutional press should 
have some clear relationship to institu- 
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tional purpose. The objectives of a col- 
lege are formal or explicit statements of 
intent: they indicate the directions in 
which a college means to influence the be- 
havior of students. They find expression 
in curricula, practices, services, policies, 
and other aspects of the college environ- 
ment. The press, as measured by the 
method described, constitute what Stern, 
Stein and Bloom (9) have referred to as 
an operational definition of objectives, or 
the implicit influence of the environment 
upon the students. Implicit press and 
explicit objectives should reinforce one 
another, for an institution should operate 
in reality the way it means to operate in 
theory. Consequently, a serious lack of 
congruence between implicit press and ex- 
plicit objectives would suggest to faculty 
members and administrators that certain 
aspects of the environment ought to be 
changed in order to make the total impact 
of the institution more consistent or more 
effective. Pace (6) has commented else- 
where on the disintegrative effect of dis- 
crepancy between stated objectives and 
actual practices. 

Some aspects of an environment can be 
changed more readily than others. The 
College Characteristics Index provides 
some direct indications of the psychologi- 
cal implications of various policies and 
practices. Roughly, one fourth to one 
third of the items in the Index state spe- 
cific practices which an administration or 
faculty could more or less easily change 
if they did not like the implications. For 
example, being able to drop a course in 
which one is having difficulty, or to sub- 
stitute another course for one which has 
been failed, is associated with Counter- 
action; insisting that students’ reports or 
papers be neat, or giving students an as- 
signed seat in class, or taking attendance 
regularly, is associated with Order. As 
the relationships among press variables 
and between these variables and institu- 
tional objectives as well as personal needs 
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are established, the significance of such 
specifie practices will become clarified. 
Other items in the Index are more indirect 
in their implications about the effect of 
various policies or practices. But the clues 
can be investigated and can thus be the 
starting point for serious discussions about 
the impact of the environment on the 
student and the relation of this impact 
to the intended objectives. , 

Another set of implications from this 
approach to describing college environ- 
ments relates to the problem of assess- 
ment and prediction. Assessment studies 
have often failed because the situations 
or environments in which assessed modes 
of behavior were supposed to occur have 
been inadequately described or differenti- 
ated. The interaction between person and 
environment was not successfully pre- 
dicted because the environment was not 
measured as analytically and systemati- 
cally as the person. The whole field of 
college prediction studies provides a good 
example. The criterion is typically a 
grade point average. No fundamental im- 
provement in predicting against this cr- 
terion has been made in the last 25 years. 
Prediction studies should be concerned 
with performance in the environment as % 
whole. The complexity of relationship Þe- 
tween person and environment is iN- 
evitably obscured by the simplified and 
often inappropriate symbolism of correla- 
tion between scholastic aptitude test and 
grade point average. The press of a col- 
lege environment represents what must 
be faced and dealt with by the student. 
It is possible that the total pattern of 
congruence between personal needs 2? 
environmental press will be more pIe 
dictive of achievement, growth, a. 
change than any single aspect of either- 
the person or the environment. 

It will be a long time before admissio®® 
officers or guidance counselors can bene 
from the results of these more comple* 
analyses. This requires the establishment 
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of known relationships between kinds of 
Persons and kinds of environments. But 
conceivably, advisers within an institution 
may be better able to help students find 
an effective and rewarding role within the 
Operative environment of the college, or 
to see more clearly the ways in which en- 
vironments need to be modified if differ- 
ent kinds of students are to grow within 
them most effectively. 

As further steps toward refining and ex- 
Ploring the potential of this needs-press 
Concept, as exemplified by the Activities 
Index and the College Characteristics In- 
dex, the following studies, among others, 
are in progress: 

1. Statistical studies of the instrument 
(CCI), including test-retest reliabilities, 
Correlation matrix of all scale scores, fac- 
tor analysis, item covariances. 

_ 2. Comparative studies of types of 
items, subjective or impressionistic versus 
relatively objective or factual. 

3. Studies of the relations between stu- 
dents’ needs scores and the corresponding 
Perception of press in the environment. 

4. Analysis of perception of press among 
Various sub-cultures within a complex in- 
stitution, 

5. Significance of congruence between 

eed and Press in determining successful 
Performance and/or satisfaction in the 
College environment. 


SUMMARY 


The present article has suggested that a 
college environment may be viewed as a 
AY: Stem of pressures, practices, and poli- 
“les intended to influence the development 
of students toward the attainment of im- 
Portant goals of higher education. The first 
draft of an instrument for measuring these 
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influences systematically has been con- 
structed. Data analyzed thus far reveal 
significant differences in the press of dif- 
ferent college environments. The instru- 
ment itself, which has subsequently been 
revised, appears to be promisingly reli- 
able and valid. In the long run, research 
which this type of instrument makes pos- 
sible should increase understanding of 
the ways in which institutions make their 
impact upon students and provide a 
broader conceptualization for evaluating 
the effectiveness of higher education, 
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In 1931, 212 children in the San Fran- 
cisco Bay Area between the ages of 2 and 
5⁄2 years were given tests which later con- 
stituted Forms L and M of the 1937 Re- 
vision of the Stanford-Binet. Careful meth- 
ods were developed and adhered to in the 
selection of these children to avoid biased 
sampling and to insure a representative 
group, because they were the California 
sample of the nationwide standardization 
of this revision (9, pp. 12-15, 18; 7, pp. 
6-7, 36-37). 

Ten years later, in 1941, 138 of these 
children still in the area were adminis- 
tered Form L of the Stanford-Binet. 
Thirteen of the original group were not 
included because as two-year-olds they 
had missed so many items at the lowest 
test level that the obtained mental age 
was a maximum expression of their intelli- 
gence, the true estimate not being de- 
terminable. Another 61 could not be lo- 
cated. 

In 1956, 111 of the 1941 group were 
located and given Form L of the Stanford- 


1 This investigation was supported by Re- 
search Grant M-1273 from the National 
Institutes of Health, Public Health Service. 

2At Stanford University when partici- 
pating in this research. 
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their standardization data available for fol- 
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Binet and the Wechsler Adult Intelligence 
Scale. Nine Ss, although located, refused 
to participate; one S was in a mental hos- 
pital and not accessible for testing; a0- 
other S was located in a community where 
no provisions could be made for testing; 
the remaining Ss could not be located. A 
detailed report of the results of this fol- 
low-up and their theoretical implications 
is in the process of preparation. This is 
being interrupted now to present a sum- 
mary of the actual IQ changes over the 
25-year period because of the pending ap- 
pearance of a new revision of the Stan- 
ford-Binet* and the appropriateness of 
having these retest data on the standard- 
ization group available at the same time. 


SUBJECTS AND PROCEDURE 


The Ss in the present study were ad- 
ministered Forms L and M of the Re- 
vised Stanford-Binet Scale 25 years pre- 
viously, in 1931, in connection with its 
standardization. They were also examined 
with Form L of this same scale in 1941 
(3, 4, 5). No interim contacts had bee? 
made. A comparison of the 1941 retest 
group of 138 Ss with the total standard- 
ization group at these ages showed some 
upward selection, as the initial mean com 
posite IQ of the retest group was 109.2 
compared with 105.4 for the standardiza- 
tion group. Most of this difference W38 
concentrated in the younger age groups» 


‘Personal communication from Maud 4- 
Merrill. 
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where the children for whom the test was 
too difficult had been eliminated. The dis- 
tributions of paternal occupational level 
for the retest and standardization groups 
were similar at all ages. 

The loss of Ss between the 1941 and 1956 
testings resulted in further selection with 
Tespect to initial mean IQ, which rose to 
112.8 for the 1956 group with the elimina- 
tion of 27 Ss, 25 of whom were not lo- 
cated or refused to participate. Apparently 
availability for follow-ups was positively 
related to intelligence, although again the 
Sreatest discrepancy was for the group 
which had been two years of age at the 
Initial testing. 

In the current testing of 1956, Form L 
of the Revised Stanford-Binet and the 
Wechsler Adult Intelligence Seale were 
administered. (In one case the WAIS was 
incomplete. In another the 1931 IQ is not 
valid; this case was included only in com- 
parisons with the 1941 results.) Of the 111 
Ss, 98 were tested by Cravens; two were 
tested by Bradway, who had administered 
all tests in the 1941 follow-up; the re- 
maining 11 Ss were tested by psychologists 
at colleges and universities located near 
me Ss’ current place of residence.’ Mo- 
bility was less than had been anticipated: 
Of the 122 Ss located, only 24 lived more 
than 40 miles from where they had lived 
m 1931, 

The retest group was comprised of 52 
men and 59 women. This is the same pro- 
portion as in the 1941 testing. Inasmuch 
S no sex differences in test results were 
ee in the standardization (9, p- 34), 
i e results in the present study are given 
Or the total group, combining both sexes. 


6 
ian authors wish to acknowledge their 
w c edness to the following psychologists 
he Participated in this way: Steven K. 
ate Walter H. Brackin, Maurice Deigh, 
ae Gray, George S. Ingebo, W. B. 
činia Tey Jeanne Reitzell, Gordon and Vir- 
and iley, Alwyn Sessions, Paul S. Spitzer, 
Helene R. Veltfort. 
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TABLE 1 
OBTAINED STanrorp-Biner IQs By YEAR 
o or TESTING 
Mean IQ sD 
19315 112.8 15.9 
1941> 112.3 16.4 
1956> 123.6 15.0 


® Composite Form L and Form M. 
> Form L. 


RESULTS 


The degree of relationship between the 
initial composite IQ (Forms L and M) of 
these Ss when they were in the age range 
from 2.0 to 5.5 (mean age 4.0) and Form 
L IQ when they were in the age range 
from 26.5 to 32.2 (mean age 29.5) is ex- 
pressed by a Pearsonian r of .59. This com- 
pares favorably with the r of .65 found 
for the same group in the first follow-up 
after only 10 years. The correlation be- 
tween the 1941 and the 1956 testings is 
85. 

All but 19 of the 110 Ss showed a higher 
IQ on the 1956 testing than they did in 
1931. This was not limited to any one seg- 
ment of the IQ range. The mean increase 
was 10.8 points. The results in Table 1 
show that this increase occurred between 
1941 and 1956, that is between the early 
adolescent years and adulthood, since the 
mean IQs in 1931 and 1941 were approxi- 
mately the same. The mean IQ increase 
from 1941 to 1956 was 11.3 points. The 
increases from 1931 to 1956 and from 1941 
to 1956 are, of course, highly significant 
statistically with CRs of 8.1 and 13.7 re- 
spectively. The 1956 WAIS mean IQ of 
108.9 + 11.0 more nearly approximates 
the 1931 and 1941 Stanford-Binet mean 
IQs than does the 1956 Stanford-Binet." 


The correlational values for the 1956 
WAIS for this group are similar to those for 
the 1956 Stanford-Binet. Pearsonian 7 with 
1931 Stanford-Binet is 64; with 1941 it is 
80. The correlation between the two 1956 
tests is .83. 
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These data are interpreted as invalidat- 
ing the assumption that intelligence stops 
increasing at 16 years, on which was based 
the standard computation of an adult 
Stanford-Binet IQ. This is consistent with 
the findings of other recent investigators. 
Bayley (1) in the Berkeley Growth Study 
found increases in Stanford-Binet scores 
of the same Ss up to the age of 17 years 
(the most recent administration), and on 
the Wechsler-Bellevue up to the age of 21 
years (most recent administration). More- 
over, a few 25-year scores available at the 
time of writing indicated that the ceiling 
of mental growth had not yet been reached 
by these Ss. The interpretation of these 
results, of course, must take account of 
the possibility of practice effects from fre- 
quent retesting. Owens (8) found an in- 
crease on Army Alpha scores by Ss who 
took the test initially at 19 years and again 
at 50 years. Bayley and Oden (2) in the 
most recent examination of Terman’s 
Gifted Children found increases in scores 
on the Concept Mastery Test for Ss cov- 
ering a total range from age 20 to age 50 
years, retested after a 12-year interval. 

It seems obvious that a correctional fac- 
tor of some sort should be applied to the 
adult Stanford-Binet IQs. Decision on the 
nature of such a correction for purposes of 
analyzing data of the present study is be- 
ing deferred until the pending publication 
of the new revision. It may be noted, how- 
ever, that whereas the IQs of 42% of the 
present group changed more than 10 points 
from 1931 to 1941, the IQs of 60% changed 
more than 10 points from 1931 to 1956. If 
the IQ is to be meaningful as an index, 
the average gain should be zero as it was 
between 1931 and 1941. When the IQs 
are equalized by taking account of the 
mean increase in IQ and subtracting 11 
points from the 1956 value, the frequency 
of changes greater than 10 points is re- 
duced to 42%. Similarly, 22% of the pres- 
ent group changed more than 15 points 
from 1931 to 1941, whereas 41% changed 
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more than 15 points from 1931 to 1956. 
Equalizing the IQs as described above, the 
latter is reduced to 22%. From 1941 to 
1956 the equalizing of the values reduces 
those changing more than 10 points from 
58% to 18% and those changing more than 
15 points from 28% to 7%. 


Discussion 


Beyond the problem of determining the 
optimal method to use in computing adult 
indices of intelligence is the question of 
what implications these data have for the 
nature of the mental growth curve and the 
theoretical question of terminal age of in- 
tellectual growth. A choice of method to 
use in arriving at the most useful index of 
adult intelligence requires the considera- 
tion of many factors. Wechsler’s use of 
IQs computed by setting a mean of 100 
and a standard deviation of 15 for each 
age is one solution. The central problem, 
however, is that intelligence, like so much 
of human behavior, is multifaceted. The 
point of view from which the question 1$ 
approached depends upon what aspect 15 
to be explored. Are we interested in an in- 
dex showing one’s place relative to others 
of the same age group, an index of one’s 
location on a scale of development and de- 
cline, an assessment, of one’s various kinds 
of mental processes? Inasmuch as differ- 
ent abilities reach maturity at different 
ages and begin to decline at different ages 
the particular abilities examined become 
deciding factors. For example, Corsini and 
Fassett (6) found from the testing of 1072 
adults on the Wechsler-Bellevue that ge2- 
eral intelligence declines from early tO 
late maturity only if visual and motor 
factors are tested, and increases if the 
items are dependent on continued leatD- 
ing. We have in progress further research 
directed at determining what factors 27° 
associated with fluctuation in measure 
intelligence throughout the life span a” Y 
the course of development and decline a 
different kinds of intelligence. 
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SUMMARY 

Of the 212 preschool children living in 
the San Francisco Bay Area who composed 
part of the standardization group of the 
1937 Revision of the Stanford-Binet, 111 
were retested as adults 25 years later. The 
sample was somewhat biased, with a tend- 
ency for Ss at the lower IQ levels to be 
more difficult to locate or to refuse to par- 
ticipate. Thus the initial mean IQ of those 
Tetested was 112.8 as compared with 105.4 
for the total group. 

The Pearsonian r between adult and 
Preschool Stanford-Binet IQs over the 25- 
year period is .59. The mean IQ showed a 
Tse of 10.8 points in 25 years. This rise 
occurred in the years after early adoles- 
cence, there being no increase in mean IQ 
between preschool and adolescent testings. 
his is interpreted as invalidating the 
assumption that intelligence stops increas- 
Ing at 16 years. Further research is in 
Progress, 


REFERENCES 
1. Baytey, Nancy. On the growth of in- 
telligence. Amer. Psychologist, 1955, 10, 
805-818 


281 


2. Bayer, Nancy, & Open, Meurra H. The 
maintenance of intellectual ability in 
gifted adults. J. Geront., 1955, 10, 91- 
107 
. Brapway, KATHERINE P. IQ constancy on 
the Revised Stanford-Binet from the 
preschool to the junior high school 
level. J. genet. Psychol., 1944, 65, 197- 
217 
. Brapway, KATHERINE P. An experimental 
study of factors associated with Stan- 
ford-Binet IQ changes from the pre- 
school to the junior high school. J. 
genet. Psychol, 1945, 66, 107-128 
5. Brapway, KATHERINE P. Predictive value 
of Stanford-Binet preschool items. J. 
educ. Psychol., 1945, 36, 1-16 

6. Corstnr, R. J. & Fasserr, K. K. In- 
telligence and aging. J. genet. Psychol., 
1953, 83, 249-264 

7. McNemar, Q. The Revision of the Stan- 
ford-Binet Scale. Boston: Houghton 
Mifflin, 1942 

8. Owens, W. A. Age and mental abilities. 
Genet Psychol. Monogr., 1953, 48, 3- 
54 

9. Terman, L. M, & Meru, Maw A. 
Measuring Intelligence. Boston: Hough- 
ton Mifflin, 1937. 


oo 


me 


Received June 18, 1958. 


JOURNAL or EDUCATIONAL PSYCHOLOGY 
Vol., 49, No. 5, 1958 


THE ADEQUACY OF “MEANING” AS AN EXPLANATION FOR 
THE SUPERIORITY OF LEARNING BY 
INDEPENDENT DISCOVERY! 


BERT Y. KERSH 


University of Oregon 


The hypothesis that learning through 
independent discovery is superior to learn- 
ing by rote is well supported by existing 
research evidence (5, 6, 7). More recently, 
however, evidence has been published 
which suggests that attempts to direct 
the learner in the discovery process may 
also be successful without loss in retention 
or transfer (2, 3). 

One reasonable explanation for the su- 
periority of both the independent discov- 
ery and the directed discovery processes 
is that learning under either condition is 
more meaningful than in the case where 
the learner simply memorizes answers. 
Meaning is used in the following report 
in the cognitive sense of understanding or 
organization. More precisely, through the 
discovery process, in which the learner is 
forced to rely on his own cognitive capaci- 
ties, he becomes cognizant of the relation- 
ships of the learning task to his previous 
experience, or to the pattern of relation- 
ships among the elements of the task. The 
superiority of such meaningful learning 
over rote learning is also well supported by 
research evidence (l, 4). 

If meaningful learning is the key con- 
cept, it should make no difference whether 
learning occurs with or without direction, 
so long as the learner becomes cognizant 
of the essential relationships. However, it 
is very likely that some procedures of 


1This study was supported by research 
grants from The Graduate School of the 
University of Oregon, Eugene, Oregon. Re- 
search assistance was ably provided by 
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learning may be superior to others simply 
because they are more likely to cause the 
learner to become cognizant of the rela- 
tionships. 

The purpose of this research was to 
study the process of learning tasks involv- 
ing arithmetical and geometrical relation- 
ships in order to determine whether or not 
the superiority of the discovery and di- 
rected-discovery procedure is adequately 
explained in terms of “meaningful learn- 
ing,” and, if not, to discover a more ade- 
quate explanation. 


DESCRIPTION OF THE TASKS 


Each S had the task of learning the 
following two rules of addition. 

1. The Odd-numbers rule. The sum of any 
series of consecutive odd numbers beginning 
with one is equal to the square of the num- 
ber of figures in the series. (For example, 
1, 3, 5, 7, is such a series; there are four 
numbers, so 4 times 4 is 16, the sum.) 

2. The Constant-difference rule. The sum 
of any series of numbers in which the dif- 
ference between the numbers is constant 
is equal to one-half the product of the 
number of figures and the sum of the first 
and last numbers. (For example, 2, 3, 4, 5, 
is such a series; 2 and 5 are 7; there ave 
four figures, so 4 times 7 is 28; half of 28 
is 14 which is the sum.) 


These rules can be learned by simply 
memorizing the task procedure as above- 
On the other hand, the learner can become 
cognizant of certain relationships to 8°07 
metrical and arithmetical concepts which 
the two rules involve, in which case his 
learning will be more meaningful. 

Consider first the Odd-numbers rule- In 
the first place, it is possible to relate the 4 
arithmetical concept of “squaring & num- 
ber” to the geometrical concept o 
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‘ 
‘square,” For example, “4°” may be con- 
ceptualized as the arithmetical counter- 
a to the geometrical concept of a square 

aving 4 units to the side. 
ly can readily be shown that any 
cong odd numbers beginning with one 
a rearranged in the form of a geo- 
ie Fs Square, as is illustrated in Fig. 1. 
re 1, each row of “X’s represents a 
seh ae first row is “1,” the second 
besten 3, and so on. The X’s which are 
the 1 side the square can be fitted into 
Spaces indicated by the dashes inside 

he box, 

oo can also be related to the con- 
Sf ame he arithmetical average. The mean 
‘a Pe such series is the middle number; 
en et is always equal to the num- 
times th igures in the series. So the mean 
ae a is equivalent to the num- 
in Fig, 9, . This relationship is suggested 
r ei now the Constant-difference 
relation i arithmetical and geometrical 
gested . hips involved in this rule are sug- 
tially Fig. 3 and 4. Each figure essen- 
verted Sp ges the original series 1n- 
a added across; and it becomes 
ast nu ‘og the sum of the first and the 
A ee er, the sum of the second and 
on, ig errant telat number, and so 
series ic he same. The sum of the original 
Co e of course, one half the sum of the 
half the of sevens formed in Fig. 3, and 
i Fig ae of the rectangle thus formed 


HYPOTHESES 


T : 
he experiment was designed to test 
ollowing hypotheses: 
L 
tenon makes no difference in terms of re- 
di Cove or transfer effects whether the learner 
tia] si; the relationships which are essen- 
task a nie understanding of a cognitive 
"ection ependently or with external di- 
as more probable that a learner be- 
Sere of the relationships which 
tial to understanding a cognitive 


com It 
are 


Fic. 2. 
2a+5= 7 
3)+4= 7 
44+ 3 = 7 
atac 7 
14 28 

Fic. 3 


Fic, 4. 


task when his attention is directed to the 
relationships than when his attention is 
directed to the task procedure alone, or when 
he is required to learn independently. 

3. It follows from the second hypothesis 
that it is more probable that what is learned 
is remembered longer and transferred more 
effectively when the learner’s attention is 
directed to the essential relationships than 
when his attention is directed to the task 
procedure alone, or when he is required to 
learn independently. 

4, In the learning of tasks which involve 
both arithmetical and geometrical relation- 
ships, the learner most probably becomes 
cognizant of the former when conventional 
Hindu-Arabic symbols are used in directing 
his learning, and of the latter when more 
nearly iconic symbols are used. 


PROCEDURE 
One group of Ss, called the “no-help” 
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group, was required to discover the rules 
without help. A second group, called the 
“direct-reference” group, was given some 
direction in the form of perceptual aids, 
such as those illustrated in Fig. 1—4 above, 
with accompanying verbal instructions 
which directed their attention to the per- 
ceptual aids. A third group, called the 
“rule-given” group, was told the rules di- 
rectly and was given practice in applying 
them, without any reference to the arith- 
metical or geometrical relationships. The 
three procedures were called, as a group, 
the “teaching treatments.” 

In addition, there were two treatments 
called the “number treatments,” which 
consisted simply of presenting the prob- 
lems in either the more nearly iconic form, 
called the “X-form,” or the conventional 
Hindu-Arabic form, called the “A-form.” 

The three teaching treatments and the 
two number treatments were combined to 
form six treatment combinations in a 2 X 
3 factorial design. The treatment combina- 
tions were identified as follows: No help, 
A-form (No-help A); No help, X-form 
(No-help X); Direct reference, A-form 
(Dir-ref A); Direct reference, X-form 
(Dir-ref X); Rule given, A-form (Rul- 
giv A); Rule given, X-form (Rul-giv X). 

A total of 60 college student volunteers 
from E’s two sections of Educational Psy- 
chology, taught in the Spring of 1957, 
formed the original sample. The Ss were 
divided into six equal groups through the 
use of a table of random numbers. 

The groups were compared in age, sex, 
grade level, and scholastic aptitude, and 
the differences were judged to be insignifi- 
cant. However, nine of the original group 
were eliminated either because the record- 
ings were defective or because they failed 
to complete the experiment. This left a 
total of eight in each group except one 
which had 10 remaining. Two more Ss 
were eliminated from the group with 10 
Ss by the use of a table of random num- 
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bers so as to make the groups equal in 
number. The remaining 48 Ss provided 
the data which were used in the analysis. 

The procedure was to have each of the 
Ss attempt to learn the two rules to the 
point where he could verbalize the rules 
and apply them in the solution of three 
different problems in succession. Tested 
individually, the Ss were asked to “think 
aloud” during the learning period, and 
voice recordings were made. 

Immediately following the learning pe- 
riod, a test consisting of 20 problems was 
given. Then, approximately four weeks 
later, the Ss were asked to solve two sim- 
ple problems and to fill out a questionnaire 
on their process of thinking. 

The learning problems and the test 
problems were reproduced on separate 
pieces of paper or cardboard so that they 
could be presented one at a time. The 
experimental procedure may best be de- 
scribed step-wise, as follows: 

Step 1. General instructions. The follow- 
ing instructions were given to each S. 


I am going to show you a series of addi- 
tion problems one at a time, and you are 
to try to discover how to find the sum of 
each problem without adding in the usua 
manner. 


Or, in the event the rule was to be 
given, the initial statement was as follows: 


I am going to tell you how to find the 
sum of two kinds of number series without 
adding, and then I will give you some prac 
tice in using the rules. 


The remaining general instructions Wete 
the same for each experimental group- 


There are two types of problems, and 
there is a different rule for each of the tw? 
types. 


At this point two sample problems we?? 
presented, as illustrated in Fig. 5. 


The first type, called the Odd-number# 
problem, consists of a series of odd number 


Fà rect 


LEARNING BY INDEPENDENT DISCOVERY 


ening with one. Here is an example. 
foa ie the sample problem in the A- 
N elow the series you are given the 
ees r of figures in the series. In this case, 
n are five figures, so the “number” is 
V, 

a lsak at the problem on the right. 
He RID lem is the same as the one on the 
if nt that the numbers are represented 
mene of X’s. Each row represents a num- 

he. you see? (Explained, if necessary.) 
regs a type problem is called the 
Bite Ce -difference problem. This type con- 
ec oe which increase by a certain 
number he series may begin with any 
mink and be of any length, but the 
coat wil always increase by a certain 
aoe p n the example, the numbers in- 
my ia z one. (Point to the sample. prob- 

Ea he A-form.) Again, on the right is 

ame problem in the X-form. 


noe this point the instructions were 

ae different for each treatment 

the ; ation. For the two no-help groups 
instructions were as follows. 


idee eee can discover the rule for the 
mind aa ers type problem first. Keep in 
of fitiging you are trying to discover a way 
without © the sum of this type problem 
Problem: adding in the usual manner. The 

is a earth you will all be examples of 
Ventional foe in the X-form (or con- 


fm this point the first “learning” prob- 
ap Was presented. The learning problems 
in ae the same as the sample problem 
8. 5, but all i iat 
number form. were all in the appropriate 


I onan 
so aie to record what you are thinking, 
S oun aloud” as much as possible. Tell 
at you are thinking, in other words. 


a S learned the rule to the Odd- 
ag ers type problem, the Constant-dif- 
oon Problem was introduced by simply 
the ey Now let’s see if you can discover 
ule Hi 7 

Broblem the Constant-difference type 
as specific instructions to the two di- 
Teference groups were as follows: 
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XXXXXXXXX 


N=XXXXX 
Fic. 5. 


N=5 


See if you can discover the rule for the 
Odd-numbers type problem first. Keep in 
mind that you are trying to discover a way 
of finding the sum of this type problem 
without adding in the usual manner. The 
problems I show you will all be examples 
of this type problem in the X-form (or con- 


ventional form). 
In order to help you discover the rules, 


these examples provide you with additional 
hints. I’ll show you these one at a time and 
explain the hints to you. 


The first problem was presented at this 
point. Examples of the learning problems 
used with the Dir-ref X and Dir-ref A 
groups are illustrated in Fig. 1 and 2, 


respectively. 
At this point, the instructions to the 


Dir-ref A group continued as follows. 


The problem will always be inside the box, 
and the hint will be to the right of the box 
where you see the column of fours. 

Notice that there are four 4’s in the col- 
umn, the same number of figures as there 
are in the problem series. Also notice that 
when you add the two columns, you get the 
same sum. Does that give you any ideas? 

I want a record of what you are thinking, 
so “think aloud” as much as possible. Tell 
me what you are thinking, in other words. 


After S learned the rule to the Odd- 
numbers type problem, the Constant-dif- 
ference problem was introduced in the fol- 
lowing manner: 


Now let’s see if you can discover the rule 
for the Constant-difference type problem. 
Again, the problem will always be inside the 
box, and the hint will be to the right of the 


box. 
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Essentially what we have done here is to 
take the problem series, invert it, and add 
across in the manner indicated. When we 
do this, you notice that we always get the 
same number, in this case 7. In other words, 
when you add the first number to the last 
number, you get 7; when you add the sec- 
ond number to the next-to-the-last, you get 
7, and so on. 

Also notice that the sum of the first col- 
umn to the right is the same as that of the 
problem series, and the sum of the second 
column, the column of 7’s, is twice the sum 
of the problem series. Now, does that sug- 
gest to you any ideas? 


Starting again at the point at which the 
first problem is presented, the instructions 
to the Dir-ref X group continued as fol- 
lows. 


In these examples, the lines enclosing 
some of the X’s and the dashes in a box are 
the hints. Notice that the number of dashes 
in the box is the same as the number of X’s 
remaining outside. In other words, the total 
number of X’s in the problem can be re- 
arranged in the form of the box. Does this 
give you any ideas? Tell me what you are 
thinking. “Think aloud,” in other words. 


After the Odd-numbers rule was dis- 


covered, the Constant-difference problem 
was introduced as follows. 


Now let’s see if you can discover the 
rule for the Constant-difference type prob- 
lem. Again the lines and dashes are your 
hints. Notice in this case that the number 
of dashes equals the number of X’s. Essen- 
tially what we have done here is to invert 
the problem series and attach it to the origi- 
nal in the manner indicated to form the 
box. In other words, when you add the first 
row to the last you get 7; when you add 
the second row to the next-to-the-last, you 
get, 7, and so on. In the end you have twice 
as many spaces within the box as there are 
X’s in the original problem. Does that give 
you any ideas? 


For the rule-given groups, the instruc- 
tions were as follows: 


The problems I will show you first are 
all examples of the Odd-numbers type prob- 
lem in the X-form (or conventional form). 
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The rule is to square the number of figures 
in the series to get the sum. The number 0 
figures is 4. Four squared, or 4 times 4, 1s 
16, which is the sum. 

Now I will show you some more examples 
of this type problem. See if you can apply 
the rule correctly to each, Tell me what you 
are thinking. 


After learning the Odd-numbers rule to 
the established criterion of three suecess- 
ful applications in succession, the Con- 
stant-difference type problem was intro- 
duced as follows. 


Now let’s see if you can learn the rule 


to the Constant-difference type problem. 
The rule is to add the first and the last 
numbers in the series, multiply by the num- 
ber of figures in the series, and then divide 
by two. With the first example, add the first 
and the last numbers (2 and 5 are 7), mul- 
tiply by the number (4 times 7 is 28), and 
divide by two (28 divided by 2 is 14). Four- 
teen is the correct sum, Now you practice 
the rule on these other examples. 


Step 2. The learning period. The learn- 
ing period was considered to begin imme- 
diately after the instructions in Step 
were given, and to end immediately after 
the S successfully applied an acceptable 
rule to three problems in succession. A? 
acceptable rule was defined as one whic® 
could be used in the solution of the pro?” 
lems as presented in the learning period, 
even though it might be inadequate for US° 
with the problems as presented in the test 
period. For example, a rule which was fre 
quently considered is to multiply the mid- 
dle number in the series by the number 
of figures. This was adequate during the 
learning period because the problems 1 
cluded all the numbers; however, the test 
problems included only the first three num- 
bers and the last of the series, which mac? 
the rule awkward to apply. This criterio? 
forced F to deviate slightly from the ex 
perimental procedure in some cases, 85 wi 
be described below. è 

Six different problems were used in the 
learning period, and they were basically 


| 


| 
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* the same for each group. The general pro- 
cedure was to wait until S had developed 
4 possible solution with one problem, then 
to give him the next problem, telling him 
to try it on another example. This pro- 
os was continued until S was success- 

M was permitted to manipulate the 
nied himself if he wished, to back-track, 
he i look at several problems simultane- 
ae. ly. Also, Æ repeated the instructions 

ien asked, and answered questions re- 

Sitding the task. Otherwise, Æ limited his 

emarks to those which were intended to 

i support and encouragement, such as, 

“ye, your idea on the next problem,” 

A a doing very well,” and “See if it 

E s.” In order to keep S thinking aloud, 

te ask, “What are you thinking 

whenever S was silent for more than 
or 15 seconds. 

wes the event that S discovered a rule 

Which appeared to work satisfactorily but 

ein, was not the intended rule, Æ first 

be to encourage him to continue 
it rehing by saying, “That’s fine, now see 

Palen can discover another rule.” In the 

to th that S perseverated, Æ would shift 

rhs problems used during the first test 

the s » and, as soon as S became aware of 

sn nadequacy of his rule, would shift 
again to the learning problems. 

‘5 lat} here were some individuals, particu- 
hes in the no-help groups, who were 
ua unable to discover the intended 
n thin the timo period of 60 to 90 
A scheduled for each S. In this 

a ihe learning period was terminated. 
atel A 3. The first test period. Immedi- 
Srey ollowing the learning period, S was 
the A problems to solve, five each of 

(Acs ollowing: Odd-numbers problems 

fata) Odd-numbers problems (X- 

form) | Constant-difference problems (A- 

form)’ Constant-difference problems (X- 


A oar 10 problems of each type were ac- 
f Y five problems presented once in the 
orm and once in the X-form. The prob- 
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lems were grouped by type and numeri- 
cal form and were clearly identified as 
such. The procedure was to present the 
Odd-numbers problems first, then the Con- 
stant-difference problems. Furthermore, 
the Ss who learned the rules using prob- 
lems in the A-form were presented the test 
problems in the A-form first. 

Step 4. The retest period. A retest was 
given to all Ss four to six weeks after the 
first test. The second test consisted of two 
problems which could most easily be 
solved by using the two rules which the 
Ss attempted to learn during the learning 
period, and a series of questions on their 
process of thinking. The problems and 
questions were completed under super- 
vision, all in one day. The instructions 
were to record all scratchwork in spaces 
provided for this purpose, and to write 
the answers to the questions as clearly and 
completely as possible. The problems were 
as follows: (a) What is the sum of the 
first 35 odd numbers, and (b) what is the 
sum of all the first 35 numbers? The ques- 
tions which followed each problem were 


the following: 


Did you add the first 35 (odd) numbers 
to get your answer? (Yes or no) If your 
answer is no, explain how you obtained your 


answer. 
Did you try to recall the rule you learned 


(or attempted to learn) under our direc- 
tion several weeks ago? (Yes or no) 
Were you successful in recalling the rule? 


(Yes or no) 
Describe how you recalled or attempted 


to recall the rule. 


ANALYSIS OF THE RESULTS 


Not all Ss succeeded in learning the two 
rules as stated above to the established 
criterion. Some of those who failed did 
manage to discover other workable varia- 
tions of the rules which were judged to be 
acceptable. Others, however, failed to learn 
any workable rule for one of the two tasks 
before the practical limitations of time 
forced Æ to terminate the learning period. 
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TABLE 1 
NUMBER Wuo LEARNED Eac RULE (or an 
ACCEPTABLE VARIATION) TO CRITERION 
DURING THE LEARNING PERIOD 


No-help| Dir-ref | Rul-giv All 
sg e 
X|A|[|XJ|AJ|X 
Odd-numbers rule 
1. n? 7|/6]/8|7}8|8| 44 
2. m? 1|/0|/0|/0|/0|0 1 
3. mn 0/1/0/0/0]0 1 
4. (a + DP} O}0]0}1)010 1 
5. none 0/1)/0/0;)0/0 1 
All rules 8|/8|/8|/8|/8|/8| 48 
Constant-difference 
rule 
1. Wn(a+1)}1/0|7/7]8]8] 31 
2. mn 1/1/0000 2 
3. none 6/7)/1})1)0]/0] 15 
All rules 8/8/8/8/8/8| 48 


a In this column, n = number of figures in series; 
m = mean, or median of series; a = first number, and 
1 = last number in series. 


Table 1 shows the number of those who 
learned the rules and each of the accept- 
able variations, and those who failed to 
learn any workable rule in each case. 

Those who failed to learn an acceptable 
rule were retained in the experiment and 
retested along with the others four weeks 
later, in anticipation of the possibility 
that their performance on the retest prob- 
lems and their responses to the retest 
questionnaire would reveal useful informa- 
tion for further research. This indeed 
proved to be the case. 

The purpose of the 20-item test given 
immediately after the learning period was 
to detect differences among the Ss in their 
achievement, if any. All Ss who learned 
the stated rules to criterion scored per- 
fectly on the test, and those who failed to 
learn the acceptable rules were, of course, 
unable to solve the problems without 
adding. 

On the retest given four weeks later, the 
datum of primary interest was the method 
used in solving the problems, not whether 
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or not the correct answer was provided. 
Table 2 shows the methods used on the re- 
test and the number of Ss in each group 
who used each method. 

A comparison of Table 2 with Table 1 
reveals the changes in the learned methods 
and the methods used on the retest, but 
does not indicate whether or not the meth- 
ods that were used on the retest were the 
same as those which were developed dur- 
ing the learning period. It is apparent that 
there were at least three possibilities: (a) 
the same method was used, (b) a differ- 
ent method was used, or (c) the problem 
was solved by simple addition, which 
would indicate complete failure to recall 
or transfer the rule if learned. Table 3 
shows the number of Ss in each of the 
teaching treatment groups who, on the re- 
test, used the same method, some other 
method than that which they had learned, 
or simple addition. The Ss who failed to 
learn any acceptable rule in the learning 
period were placed in the “added” cate- 
gory if they added on the retest, and in 
the “other” category if they attempted 
some other procedure on the retest. 

The results presented above fail to sup- 
port the hypothesis that the Ss whose at- 
tention is directed to the relationships 
which are essential to understanding Te- 
member their learning longer and transfer 
it more effectively than do the Ss of the 
other two treatment groups (Hypothesis 
3, above). The prediction was that the di- 
rect-reference group would be superior. In- 
stead, although the obtained differences 
are not highly reliable, they are consistent 
with previously published data to the ex- 
tent that they suggest that the independ- 
ent discovery procedure is superior tO 
learning by rote. 

The reader’s attention is directed to the 
fact that although 13 Ss in the no-help 
groups failed to learn an acceptable rule 
for the Constant-difference problem dut- 
ing the learning period, there were © MA 
four who added on the retest, and 10 who 
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i TABLE 2 
METHODS USED ON THE RETEST 
E u 
No-help Dir-ref magy | aba AIL teaching 
Method® 
\ A | el | SA ay | Se |) | z ie Dis ee 
Problem 1. Sum of the first 35 odd numbers 
a Ce tel 2) eG (aa Sole 
» ¥n(a + 1) ola lje)]ilol|oe]o)] sai camo 
3. D(a +) oo} a | a} a | | By) monan 
4. mn 2}olololojoj/2lto|21o0]0 
b b 
5 Not acceptable OF a E | as | 2 | 0 | 4] 2 
- By addition o | wi as | a | 6) | se 08 E 
Chi square Not applicable 1.39* 11.18** 
Problem 2. Sum of all the first 35 numbers 
1. Yn(a + 1) sra Slee fe 0 | § AA 
2. mn 4t | oO a on oa J) ARNE. 0 
i af [Lo a 
3. Not acce onea aE 
! ptable 1 1 3 2 1 | | 
4. By addition ita la 213) 8 | 6 ie eae 
Chi square Not applicable 1.36% 3.86" 


“In this column, n = number of figures in series; m = mean, or median of series; a = first number, and 1 = last 


number in series. 
«> eauencies above and below line combined for chi-square analysis. 
‘Not significant at the 0.05 level. 
Significant at tho 0.05 lovel. 


TABLE 3 


CorresponpENcE oF Mernops USED ON THE RETEST WITH THOSE 


LEARNED DURING THE LEARNING PERIOD 
Il teachi 
$ No-help Dir-ref Rul-giv Alnmba All teaching 
A x A x A l X  [No-help| Dir-ref | Rul-giv 
Problem 1. Sum of the first 35 odd numbers 
fame dt (48. [29], thm & | te 
2 3 5 
ouded LST see, tales) EOE | 7 
thers O18 | a eee Ser Se aCe A 
2i Square Not applicable 3.48 8.21 
Problem 2. Sum of all the first 35 numbers 
Same E E Si er a 
3 3 4 
oftdea he bake pel S ME ae “4 | wA 
other 4 3 Soiled i | 0 Se | zh g A 
l Chi square Not applicable 3.03* 9.99** 
Fi 
4 


` 
p, Not significant at the 0.05 level. 
Significant at the 0.05 level. 
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TABLE 4 
NUMBER WHO WERE JUDGED To BE CoG- 


NIZANT OF Eac TYPE or RELATIONSHIP 
DURING THE LEARNING 


PERIOD 
No-help| Dir-ref | Rul-giv All 
Relalaship Treat- 
AEA E | Aa Se | Oe 
Odd-numbers rule 
Arithmetical | 1]1]1/0/0]0 3 
Geometrical 0/2;/0)4/0)]1 7 
Both 0/0/;0;/1/0]0 1 
Neither TLS PC | S| BT | 37 
Constant-difference 
rule 
Arithmetical |3]1]6/0]/0/0] 10 
Geometrical 0;/0;/0;2/0];0 2 
Both 0/0/0/0/0]0 0 
Neither 5|/7/2/6/8]8] 36 


used acceptable methods. In the case of 
the other two teaching treatment groups, 
the number who added increased, and the 
number who used acceptable rules de- 
creased rather markedly. This finding sug- 
gests that as a result of their experience 
during the learning period, the Ss in the 
no-help groups were motivated to continue 
learning afterwards and those treated 
otherwise were not. 

The superiority of the discovery pro- 
cedure may be better explained in terms 
of motivation, then, than in terms of un- 
derstanding. This suggestion is given added 
support by the following data. 

In an attempt to determine the num- 
ber of Ss in each experimental group who 
actually became cognizant of the arith- 
metical, the geometrical, or both types of 
relationship, the typed transcriptions of 
the voice recordings made during the 
learning period were carefully examined. 
Any S was judged to be cognizant of a 
relationship if he verbalized it at any point 
in the process of his learning. As a result 
of E’s continued efforts to stimulate each 
S to think aloud during the learning pe- 
riod, very complete records were obtained. 
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In addition, each S who failed to volunteer 
an explanation of the rule was asked to 
do so at the end of the period. In most 
cases of the latter type, the S’s response 
was that he could not provide an explana- 


tion. Table 4 shows the results of this — 


analysis. 

Most striking is the fact that only one- 
quarter of the entire group of 48 Ss was 
judged to be cognizant of one or both 
types of relationship to each of the two 
rules. The small number of Ss precludes 
any statistical check of the reliability of 
the obtained difference in the experi- 
mental groups. However, accepting the 
obtained data as reliable, the results ap- 
pear to support the prediction that the 
direct-reference treatment would produce 
the greatest incidence of understanding. 
The next highest frequency of understand- 
ing was in the no-help group, and eyen 1n 
the rule-given group there was at least 
one S who understood the odd-numbers 
rule. Furthermore, with only two excep- 
tions, both in the no-help group, the type 
of relationship discovered corresponded to 
the type that was predicted from the num- 
ber treatment. When the problems were 
presented in the A-form, the arithmetical 
relationship was discovered, and when the 
X-form was used, the geometrical rela- 
tionship was discovered. The data, there- 
fore, tend to support Hypotheses 2 and 4 

The data presented in Table 4 do not 
preclude the possibility, however, that the 
Ss who were not cognizant of the relation- 
ships at the end of the learning period 
subsequently did become cognizant of 
them. The most adequate test of the effi- 
cacy of understanding in learning is to de- 
termine how those who actually verbalized 
the relationships performed on the retest 
problems. Table 5 shows the number of 
those who, on the retest, used the same 
tule they learned, those who simply added, 
and those who used some other method 
than that learned. 

Again, the dearth of data does not pe!” 
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a TABLE 5 
E i 
T Usep on Retest oF THOSE WHO 
E Topan TO BE COGNIZANT OF THE 
WSSENTIAL RELATIONSHIPS 


biicas No-help| Dir-ref | Rul-giv all 
AQ) | AE | LS iiij 
Problem 1 
hime Rule 1]/3]o0];2]o]0} 6 
Othes ofojo};ijoji} 2 
ges ololi}/2{o]}o} 3 
Methods |1/3]/1]/5/0]/1] n 
Problem 2 
one Rule |2]1/3]0]ļo]o| 6 
Ones 1Jo/o}2]o}o] 3 
All Mo 0ojoj3jojojoj 3 
Methods 3/1}/6]/2]/0]/0] 12 


es ie Conclusive test of the first hypothesis. 
ih ver, since only about one-half of 
Under, who learned acceptable rules with 
Haagen during; the learning periei 
Fs aag in retaining and transferring 
do er arning four weeks later, the data 
fg offer very conclusive support of 
meaning theory. 


Discussion 


>; 
nee the data which have been pre- 
quac, we Sufficient to suggest the inade- 
ea of the “meaning” explanation, but 
of ane to communicate the impression 
est in rig increased motivation or inter- 
Of th le task which characterized many 
Dana, Ss in the no-help group as com- 
rou with many of those in the other 
reflecte The difference in motivation was 
retest ed in their written comments on the 
ample and verbal reports to F. For ex- 
Ported one § in the no-help group re- 
Bitetace that he was so intrigued with his 
told T mn discovering the rules that he 
tested th friends of his experience and 
Brou, their ability. Others in the no-help 
n the Who failed to discover the rule told 
ing elr efforts to learn the rule, even g0- 
So far as to look up the algebraic 
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formula in the library. On the other hand, 
one S in the rule-given group complained 
that E had not instructed him to remem- 
ber the rules, so he had promptly forgot- 
ten them. Others in the rule-given and 
direct-reference groups commented on 
their confusion during the learning period 
in explaining their inability to recall the 
rules. The evaluation of the Ss’ comments 
was admittedly highly subjective, but it 
may better suggest the nature of the mo- 
tivation which is offered as the more ade- 
quate explanation for the superiority of the 
discovery procedure. 

Of the various descriptive concepts of 
human motivation known to this present 
writer, Allport’s concept of functional 
autonomy best describes the motivation 
that was developed by those in the no-help 
group. Whereas, at the beginning of the 
learning period, all the participants were 
assumed to be alike in that they were mo- 
tivated primarily by such extrinsic factors 
as the approval of F, at the end of the 
learning period, the motivation of many 
of those in the no-help group appeared to 
be independent of the experimental or in- 
structional situation. Presumably, the mo- 
tivating power is of the type that lies in 
acquired interest or ego involvement in a 
task, and develops to the extent that the 
individual relies on his own cognitive ca- 
pacities in learning. 

The findings place the teacher in some- 
what of a dilemma. He must decide which 
is the more important outcome of learn- 
ing, (2) maximum understanding, or (b) 
maximum motivation to continue learning. 
However, further research on the problem 
may reveal ways of directing the discovery 
process without inhibiting the develop- 
ment of autonomous motivation. Teachers 
should continue to strive to guide the 
learning of their students but should re- 
frain from giving answers directly. 

Finally, the findings suggest that more 
consideration be given to the influence of 
the stimulus materials in directing the 
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thought processes of the learner. Particu- 
larly recommended is a careful study of 
the educational “toys” that are.used in 
some primary classrooms to develop num- 
ber concepts, and the practice exercises in 
workbooks and on worksheets which are 
intended to direct the learner’s attention 
to relationships. 


SUMMARY AND CONCLUSIONS 


An experiment was performed with col- 
lege students in order to test the premise 
that learning by independent discovery 
is superior to learning with direction be- 
cause the learning is more meaningful in 
the former case. “Meaningful learning” 
was used in the cognitive sense of under- 
standing or organization. The Ss were re- 
quired to learn two arithmetical tasks to 
a common criterion. A two-factorial de- 
sign was used which involved three “teach- 
ing” treatments (no-help, direct-reference, 
and rule-given), and two “number” treat- 
ments (A-form and X-form). Approxi- 
mately four weeks later a retest was ad- 
ministered which was designed to test the 
ability to recall and apply the learned rules 
to a somewhat different type of problem 
from that used in the learning period. The 
Ss’ self-reports of their learning process 
and methods used on the retest were ana- 
lyzed. 

The conclusion is that the superiority of 
the discovery procedure of learning over 
procedures of learning with external di- 
rection is not adequately explained in 
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terms of “meaningful learning.” The re- 
sults of this experiment suggest that when 
the learner is forced to rely on his own 
cognitive capacities, it is more likely that 
he will become motivated to continue the 
learning process or to continue practicing 
the task after the learning period. Conse- 
quently, the learning becomes more per- 
manent and is more effectively transferred 
than when the learner is not so motivated. 
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ger there is general recognition of 
aa value of organized learning for 
jaa and transfer, differences on how 
os Aas pe should be taught and at- 
‘rand lead to quite different theories of 
i er. On the one hand are those who 
cas y outside direction of learning when 
sin is interested in transfer. Katona (7) 
TA and geometic puzzles found 
a s memorization group was signifi- 
ae Y poorer on invention and transfer to 
S problems than his “Help” group that 
that bad examples. He concluded “oe 
SS palatin the general principle in 
appli 1s not indispensable for achieving 
HE cation,” (7, p. 89) but he was un- 
5 nA to say that learning of principles 
A ords is always less efficient than by 
ence He put teaching the result as 
Satie method, teaching by stating the 
aan as intermediate, and teaching by 
fete e as best. However, Hendrix (5) 
thos that with a mathematical principle 
dk m that discovered the prm- 
ized independently and left it unverba- 
the exceeded those who discovered an 
hes, verbalized, and both exceeded in 
for te those who had the principle stated 

0 em and then illustrated. 

Pposed to Katona and Hendrix are 
a like Craig who concluded: “The 

Te guidance a learner receives, the more 


1 . 
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efficient his discovery will be; the more 
efficient discovery is, the more learning 
and transfer will occur” (1, p- 72). In a 
further study with college groups and with 
the same method of having the S pick 
out that alternative among five which 
does not fit a principle (2) he verified 
that significantly more such problems were 
solved when the principle was stated above 
it than when the S was given only the 
instruction that one of the five items 
did not belong. One should note, however, 
that he found no difference between his 
r to new principles nor 
was there any difference in retention after 
3 or 17 days, although at 31 days the dif- 
ference favored the directed group. 

While Craig’s experimental results ac- 
tually gave little or no support to his 
claim that guidance is desirable for trans- 
fer, more serious opposition to Hendrix 
and Katona came from Kittell (8). He 
found that “intermediate direction” (start- 
ing a principle) was significantly superior 
to both the “minimal direction” (told 
only that one of five alternatives would 
not fit) and “maximal direction” (E 
told the principle and worked out the 
answer for S), with minimal direction 
definitely the inferior method. His sub- 
jects were sixth graders while Craig’s 
were college students. That difference in 
age and educational level may explain the 
contrast in results. Also Kittell’s low num- 
ber of successful solutions (means only 
4.59 for intermediate direction and 1.93 


groups on transfe: 
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for minimal direction out of 15 principles) 
suggests that the problems based on lin- 
guistic arrangements and meanings may 
have been too difficult for the Ss. If that 
were the case, then following directions 
in the stated principle was about the only 
way to solve the problem when unpro- 
vided with sufficient apperceptive mass 
and experience. Haslerud (4) found that 
while naive rats transferred anticipatively 
from forced turns near the goal into 
prior free units of a maze just as well 
as when those goal turnings had been 
established by trial and error, only active 
trial and error cul-de-sac elimination in 
the goal region could readjust an estab- 
lished pattern in the prior free units. If 
a similar limitation on effectiveness of 
guidance is present in human Ss, then one 
might expect any advantage primarily in 
young Ss and that mainly on their initial 
learning but none for memory and trans- 
fer where Ss have sufficient background to 
derive a solution themselves. 

While the Katona and Hendrix con- 
cept of how to get maximal transfer seems 
to have face validity, at least for adults, 
their controls and statistical supports are 
unsatisfactory. When one draws his con- 
clusions on the basis of one principle, e.g., 
the sum of the first n numbers, a question 
remains of how much of the conclusion is 
a function of the particular problem used 
or the seleetion of individuals for the vari- 
ous groups. More convincing differentia- 
tion of principle given from principle 
derived would seem to require homogene- 
ously varied problems posed in quantity 
to the same individuals. A likely material 
has been found in an extension of the 
familiar cryptogram “Come to London” 
in the Stanford-Binet. An unpublished 
pilot study by the junior author under 
the senior author’s supervision indicated 
an advantage for memory of the inde- 
pendent solving of such coding principles. 
The present study extends a similar 
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method to transfer. The hypothesis tested 
is that principles derived by the learner 
solely from concrete instances will be more 
readily used in a new situation than those 
given to him in the form of a statement of 
principle and an instance. 


PROCEDURE 


Subjects for the experimental group 
were 76 members, ranging from freshmen 
to seniors, of two general psychology 
classes at the University of New Hamp- 
shire. The control group of 24 students 
in another psychology class ranged from 
sophomores to seniors. 

The experimental groups were each 
given two coding tests, the second being 
administered one week after the first. The 
control group was given only the second 
test. All tests were administered by the 
senior author. 

The first test composed of 20 coding 
problems was designed to give the students 
two types of experience: (a) problem 
solving with specifie directions for de- 
ciphering the code printed above each 
problem, and (b) problem solving with no 
directions given. The first part of each 
problem was the four-word sentence “They 
need more time,” followed by the same 
sentence in code. A different code was 
used in each problem. The second part of 
each problem was the four-word sentence 
“Give them five more,” which the Ss were 
asked to translate into the code for that 
problem. The given and derived problems 
were alternated so that the S would solve 
approximately equal numbers of each 
kind. As a control for differences betwee? 
the codes, there were two test forms, 
and B. The same codes were used in both, 
but those for which directions were give? 
in form A had to be deciphered by the 
in form B, and vice-versa. The problem 
were arranged in approximately the ap- 
parent order of difficulty. Examples ° 
moderately easy coding rules are: “For 
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each letter of the sentence write the letter 
that follows it in the alphabet.” “Write 
the first two letters of each word and then 
the last two letters of each word.” To in- 
troduce the test the senior author told the 
Ss that the test was an experiment in 
cryptography. He wrote an illustrative 
code on the blackboard and purposely 
Worked it out partly incorrectly to en- 
ring remonstrances from the group 
ea a system or principle was possible. 
Fat s. ere asked to solve the problems in the 
as ~ they appeared on the test and to do 
Sine any as they could in the time allotted. 
fe, = 45 minutes allotted was ample 
Sih or all but one or two students aS 
Dowe ‘Group, the test was essentially a 
Wer er rather than a speed test. The Ss 
on i not told that they would be retested 
he same material. 

ans second test printed only in one 
fad n Was given to both the experimental 
Used control groups. Again, the 20 codes 
Stead in the first test were used, but in- 
ther of the common sentence of Test 1, 
4 = were 20 different English sentences 
heat letters in length followed by four 
Was slations into code. Only one translation 
shee ne and the Ss were asked to 
thre abs They were told that the other 
tag were simply letters arranged in ran- 
ers — They were not told that num- 
alph a been assigned to letters of the 
fe et and that letters for two of the 
o th codes had been selected according 
Riana order in which those numbers ap- 
third in a list of random numbers. The 
of th false code was composed of letters 
Or e English sentence arranged according 
he oe numbers, The order in which 
the suy codes followed the sentence and 
arran, rder in which the problems were 
ene on the test were also random. 
Or y ention was made of the previous test, 
vas the purpose of the test told until 


commie had been given and the results 


RESULTS 


The data for each individual in the ex- 
perimental group consisted of four scores: 
(a) Number of correct codings on Test 1 
problems where the rule was given, here- 
after called G, scores. (b) Number of cor- 
rect codings on Test 1 problems where the 
coding principle had to be derived by the 
S, hereafter called D, scores. (c) Correct 
alternatives for those codes in Test 2 that 
had been G type in Test 1. (d) Correct 
alternatives for those codes in Test 2 that 
had been D type in Test 1. In the control 
group the score was the total number of 
correct alternatives on Test 2. Any coding 
was considered correct if no more than 1 
of the 16 letters was wrong, since careless- 
ness rather than lack of understanding of 
the principle was probably responsible for 
the lone error. 

Since there was no difference between 
their results, the two experimental classes 
were combined. The analysis of results, 
however, was carried through separately 
for Forms A and B of Test 1 because a 
difference significant at the .05 level indi- 
cated that the 10 odd and the 10 even 
problems had not been exactly equated for 
difficulty. Nevertheless, the direction of 
results for both A and B groups showed 
equally high differentiation of G and D 
situations. 

Test 2 performance of the experimental 
group was significantly different from that 
of the control group. The means, 15.74 and 
10.75 respectively, differ beyond the .001 
level. Apparently something is transferred 
from the Test 1 experience. 

The crucial comparisons are between the 
G and D kinds of problems. For both 
Forms A and B on Test 1, significantly 
more G problems were correctly coded: 
8.86 and 8.36 against 5.86 and 4.88 for G 
and D respectively. T he results for Test 2 
a week later are given in Table 1. If the 
differences are added algebraically to the 


296 


TABLE 1 
DIFFERENCE IN NUMBER or PROBLEMS SUC- 
CESSFULLY CODED BETWEEN THE TRANSFER 
Test (Test 2) AND THE INITIAL LEARNING 
(Test 1) wrra EACH INDIVIDUAL as His 
Own CONTROL 


| N | xdiff jdaics |@zdite] ¢ signif. 
D2—D; Problems 
Test A | 36 | 2.83 |3.34) .56|5.06)p < .001 
Test B 40 | 2.50 |3.30] .52/4.81lp < .001 
G2—G, Problems 
Test A 36 |—0.83|2.37| .38/2.17|p < .05 
Test B 40 |—1.03|2.21| .35|2.94/p < .01 


Test 1 scores given in the previous sen- 
tence, one obtains the nearly equal trans- 
fer scores of Craig’s experiment (2). But 
since each individual was his own control 
for both G and D problems on Tests 1 and 
2, it is legitimate to use the subtraction 
method to find the standard error of the 
difference for paired observations. The 
correct identification of those codes which 
had been D type on Test 1 increased 46% 
while those which had been G decreased 
10%. Both changes are significant, at the 
001 and .05 to .01 levels respectively. 
There is reason. to think that both cur- 
tailing time to make Test 2 a speed test 
and increasing time to greater than a week 
between the learning of the codes on Test 
1 and the transfer on Test 2 would ac- 
centuate the differences. 


Discussion 


This experiment has added strong sup- 
port to the contention of Katona and 
Hendrix that independently derived prin- 
ciples are more transferable than those 
where the principle is given to the student. 
Even though Ss produced more correct 
codings on the original learning when the 
principle was stated for them, on the “pay- 
off,” or “applying” to use Katona’s term, 
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the advantage definitely passed to those 
principles derived by the student himself. 
Fast and accurate learning or performance 
under immediate guidance is no guarantee 
of transfer to new problems without such 
support. From Craig’s and our experiments 
the conclusions just stated are supported 
by results on college students, but testing 
of grammar level students by principles of 
amore suitable level of difficulty than used 
by Kittell (8) might show a wider appli- 
cation. Our coding method could be easily 
adapted for that purpose. 

The obtained results of this experiment 
do not follow from inadequate controls. 
The alternate Forms A and B allowed 
each principle to be given (G) and derived 
(D). Individual differences with respect 
to problem solving in the Ss were ruled 
out since each person responded to 10 G 
and 10 D problems on Test 1 and the 
follow-up of each of these on the transfer 
Test 2. The control group’s much poorer 
performance on Test 2 indicated that & 
genuine transfer function was present. 
Making time on each test practically un- 
limited pushed the G and D types of pres- 
entation to their limit as power tests. 

Two possible weaknesses in the transfer 
Test 2 need to be examined. With four al- 
ternatives for each problem, a chance 
score would average 5. The control grouP 
had 10.75 problems correct; this showed 
good adaptation to the test but signifi- 
cantly less than the 15.74 of the experi- 
mental groups. The question whether the 
better performance of the experimental 
group was just the result of a second ses- 
sion of practice on coding problems cat 
probably be answered by reference to the 
study by Warren (9), He found that 
adults on letter-symbol substitution rap- 
idly attain a plateau on transfer prob- 
lems because of “learning sets” from early 
childhood. Coding is in that clase of simple 
activities for adults where experience a0! 
practice as such make little difference after 
the first 10 minutes. Even if one took the 
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maximum change of 37% during Warren’s 
16 five-minute periods, it would be less 
than the nearly 50% advantage of our ex- 
perimental group over the control group. 
The second possible weakness arises from 
the randomized construction of the false 
alternatives of the transfer test. It is con- 
ceded that a person might try to solve the 
Problems by excluding the three alterna- 
tives because of their random characteris- 
tics rather than by trying to recognize 
and verify some consistent principle in the 
One true alternative. However, the prin- 
ciples must have played a significant role 
in the solutions because without them the 
results of the control and experimental 
groups would have been equal since they 
had the same instructions and equal op- 
Portunity to use this abortive device. 

The theories of transfer found in current 
educational psychologies are inadequate 
to explain the present experiment. The 
Senior author plans to develop in another 
gas a theory that transfer is fundamen- 
= ly an anticipative rather than a per- 
ome function and that to get trans- 
T one must always counteract the finality 
eet goal (3). A stated principle to some 

xtent, and eyen more Kittell’s “maximum 
Se of E doing the problems for S 
7 er giving him the principle, practically 

Ops transfer, like other goals. Hendrix 
Ne A ita from ‘Thorndike that only 5% 
ai gh school students have language 
FERA sufficient to receive a ready-made 

tence and find readily illustrations 1n 
TRI own background to provide the pre- 

Quisite to meaning. If the results of the 
i experiment can be verified for a 
ia er range of ages and apperceptive 
z io then the implications for a direct 
te Tt to teach for transferable princi- 

can not be neglected. 


SUMMARY 
iis educationally important question of 
te much guidance is desirable if one 1s 
Tested in transfer was tested experi- 


ho 
in 
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mentally by a new use of coding. Hach of 
76 college students as his own control 
translated into 20 different codes a com- 
mon four-word sentence, with the rule 
given for half of the problems and re- 
quired to be derived solely from example 
for the other half. As in previous studies 
on initial learning, the Ss did significantly 
better on those problems with the rule 
given. However, a week later on a mul- 
tiple-choice transfer test consisting of 20 
different sentences, one for each of the 
20 coding principles of the first test, the 
selection of the adequate code from three 
specious ones made by randomizing letters 
gave very different results. The scores 
were significantly increased for those 
problems which had formerly been de- 
rived as contrasted with a significant de- 
crease for those problems where the rule 
had formerly been given. A control group 
of 24 college students given only the 
second test proved by significantly poorer 
performance than the experimental group 
the value of transfer from the first test. 
The results give strong support to the 
postulate of Hendrix that independently 
derived principles are more transferable 
than those given. The apparent contra- 
diction with Kittell’s study of children 
was explained by the smaller apperceptive 
mass in the child, and the prediction was 
hazarded that as naivety is lost, the prob- 
ability of transfer from learning which is 
minimally directed is increased. 
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OCCUPATIONAL LEVEL AND THE PRIMARY MENTAL ABILITIES* 
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Thurstone’s S.R.A. Primary Mental 
Abilities test (PMA) is frequently used as 
the intelligence test component of a battery 
given for guidance purposes. A reason for 
this use is the common inference that a 
Study of the separate and presumably 
independent scores for the different abil- 
ities will yield clues to predict future suc- 
cess in certain school courses and voca- 
tions, Since intellectual functioning is 
Senerally found to be an important de- 
terminant in predicting successful per- 
formance it would be of interest to vali- 
date assertions that one could go a step 
further and predict differential success 
for a given type of vocational choice. 

A scrutiny of the PMA literature sug- 
Bests that validation studies have been 
concerned primarily with the correlation 
id the Primary Mental Abilities with a 
variety of achievement tests. Examples 
of such studies are reported in the test 
ae for the relation of the PMAs 
vith the Stanford Achievement test (5) 
and with the Iowa Tests of Educational 
th, lopment (4). Other work has related 

he PMAs to the United States Employ- 
Ment Service General Aptitude tests (2)- 
5 S studies, done primarily with high 
chool populations, conclude that the 
ots are fairly good predictors of cur- 
Pe achievement and are useful for guid- 

ce purposes. , 
Orn’ Of these studies provide any m- 
Tmation, however, upon success in pre- 
Ieting actual occupational choice. The 
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present inquiry attempts to fill this gap 
indirectly by examining a group of adult 
individuals who have made a firm occupa- 
tional choice to see whether they could 
be differentiated in terms of their PMA 
scores as to the type of occupation se- 
lected. 


HYPOTHESES 


The PMA manual was consulted to see 
what kind of predictions were suggested 
by the test authors and others in relating 
performance on the PMAs to activities 
required in various occupations. One pur- 
pose of the PMA profile, for example, 
is its use for estimating the individual’s 
general level of intelligence. Young people 
planning to go to college are presumed 
to require above average standing on most 
of the abilities, but particularly on Ver- 
bal-meaning (V) and Reasoning (R). 
People whose occupational choice results 
in professional types of activity would 
therefore be expected to show high per- 
formance on all abilities but should show 
particular elevation on V and R. Space 
ability (S) is presumed to be important 
for occupations like electrician, machinist, 
engineer or carpenter. Skilled laborers, 
should therefore be found to be high on 
S. Accountants, cashiers, bank tellers, 
sales clerks and the like are supposed to 
be favored by good arithmetic ability and 
should thus be high on Number ability 
(N). People who run their own business 
or belong to the managerial category 
would be expected to have an education 
and skills somewhere in between the pro- 
fessional and clerical groups and would 
thus be expected to be high on some 
attributes common to both. 
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METHOD 


A great many personality and other 
variables are involved in determinirig spe- 
cific job choice and their control would 
be extremely difficult. It was therefore 
decided to concentrate upon a more gen- 
eral differentiation into 10 major occupa- 
tional headings as used in the reports of 
the United States Census Bureau. Since 
the counseling use of the PMA is usually 
at the high school level, four of these 
major occupational classifications were 
selected for study as they are probably 
the most important ones being considered 
in a great majority of cases. These are: 
(a) professional and semiprofessional, (b) 
managerial and proprietary, (c) sales and 
clerical, and (d) skilled labor. In order 
to avoid artifacts introduced by transient 
or enforced job choice or possible PMA 
sex differences, only male Ss were used. 
Since we are interested in stable occupa- 
tional choice no S was to be included if 
he reported a change in his occupation 
or job specification over the past five 
years. 

As part of another investigation, data 
were available on the PMA scores and 
the occupational status of a sample of 
500 adult Ss (3). Since age changes on 
the PMAs over the adult age range are 
known to be substantial, these were con- 
trolled experimentally by matching for 
age over the four occupational levels. 
From a pool of 172 Ss who met the 
initial criteria for inclusion it was thus 
possible to match 20 sets of Ss, or a total 
of 80 Ss. These ranged in age from 26 to 
65 years with a mean age of 45.5 years. 

The S.R.A. Primary Mental Abilities 
test, intermediate form, was given to each 
S and was administered in group sessions 
using the instructions given in the ex- 
aminer’s manual. All raw scores were 
converted to standard scores with means 
of 50 and standard deviations of 10 by 
use of the norms available for the total 
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sample of 500 adult Ss. The reported 
mean scores are therefore directly com- 
parable for the different mental abilities. 


ResuLTS 


The first step in the analysis of the 
test data was to compute means and 
standard deviations which are given in 
Table 1. The analysis of variance was 
then employed for a formal test of the 
null hypothesis with respect to over-all 
differences between the different PMAS 
and between occupational levels. The anal- 
ysis for the total sample is presented 
in Table 2 and uses methods suggested 
by Edwards (1). The test for the differ- 
ence between occupational levels is based 
on independent observations and thus uses 
within level variance as its error term. 
The test for the over-all differences be- 
tween PMAs however, requires adjust- 
ment for correlation between the mental 
abilities. The pooled interaction of indi- 
viduals and PMAs is therefore the correct 
error term for this test. 

Inspection of Table 2 shows that F 
ratios for the variance associated wit 
differences among PMAs as well as be- 
tween occupational levels were found to 
be significant at the .001 level of con- 
fidence and the null hypothesis was there- 
fore rejected. The interaction betwee? 
PMAs and occupational levels, howeve? 
was not significant. These findings suggest 
that there are significant differences 
over-all intellectual level between the 
different occupational groups as well a 
significant differences between scores 0 
different abilities for most individuals. 
The lack of systematic interaction, how- 
ever, indicates that specific PMA profile 
patterns are not a function of occupa” 
tional level. It appears then that profile 
elevation, i.e. level of intelligence as €57 
timated by the total PMA test, rather 
than profile pattern should be considered 
as the significant variable for predictin 
future occupational level. 
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TABLE 1 
MEANS AND STANDARD DEVIATIONS ON THE PRIMARY MENTAL ABILITIES 
FOR DIFFERENT OCCUPATIONAL LEVELS 


(N = 20 in each level) 


Skilled Clerical Managerial Professional 
labor & sales & propr. & semi-prof. 
Yerbal-meaning 44.0 9.5 51.5 7.4 50.9 10.2 55.1 5.8 
oe 51.3 10.0 56.7 11.8 54.6 10.2 55.2 9.6 
easoning 44.2 7.1 504 7.8 49.5 9.8 53.0 st 
umber 483 8.7 54.5 8.6 55.2 7.8 57.7 10.4 
ord-fluency 441 8.7 54.8 9.7 47.4 6.8 53.2 9.8 
TABLE 2 


ANAtysis or VARIANCE FOR THE COMBINED SAMPLE TESTING 
Respecr TO DIFFERENCES BETWEEN PRIMARY Men 


que Nuru Hyrotuesis WITH 
TAL ABILITIES AND 


BETWEEN LEVELS OF OCCUPATION 


bi (N = 80; 5 scores for each S) 


df Mean square F ratio 


6.78* 


Between levels 4,249.04 3 1,416.35 
etween individuals in level 15,877.56 76 208.92 
Total between 20,126.60 79 
Between PMAs 1,912.40 4 478.10 9.26* 
nteraction: levels X PMAs 575.56 12 47.96 
ooled interaction: individ- 15,688.44 304 51.61 
uals X PMAs = 
Otal within 18,176.40 320 
Total variance 38,302.00 399 


* Significant at or beyond the -001 level of confidence. 


a above analysis does not rule out 
le possibility that a given mental ability 
f tend to discriminate between dif- 
erent occupational levels while others do 
not, It is also possible that differences 
etween Mental Abilities occur only in 
Certain but not all of the occupational 
evels studied. To clarify these problems 
ùrther analyses of variance were made 
als each separate occupational level and 
ities for each of the different Mental Abil- 
1 results shown in Table 3 indicate 
aa there are indeed significant differ- 
on between the different PMA mean 
Scores within both the “skilled labor” and 
Managerial” levels. Referring back to 


Table 1 it may be seen that for the 
“skilled labor” level Space is high, while 
low performance is found on all the ver- 
bal skills (V, R, and W). In the “man- 
agerial” group high scores are found to 
be Space and Number while this group 
js also low on V, R, and W. It is worthy 
of note that these patterns obviously 
overlap, explaining why the interaction 
between occupational level and abilities 
cannot be significant and why profile el- 
evation turns out to be the significant 
discriminator. 

Table 4 gives the results of the anal- 
ysis of variance for the Primary Mental 
‘Abilities with respect to differences among 
occupational levels on each separate 


TABLE 3 
ANALYSIS OF VARIANCE ror THE DIFFER- 
ENCES BETWEEN PRIMARY MENTAL, ABILI- 
gigs IN Each SEPARATE OCCUPATIONAL 
LEVEL, ADJUSTED FOR THE EFFECT OF 
CORRELATION WITHIN INDIVIDUALS 
(N = 20; 5 scores for each individual) 


K. WARNER SCHAIE 


Another interesting analysis can be 
made by inspecting the standard devia- 
tions presented in Table 1. Since the pop- 
ulation standard deviation has arbitrarily 
been assigned to be 10, the standard de- 
viation for any subgroup would be ex- 
pected to be significantly lower on any 


Borgon aithin e fresi Variable which tends to discriminate the 
ii subgroup from the total group. A low 
MS | F MS | F standard deviation would thus indicate 


that this is a variable on which the sub- 


Skilled la- (222.31|6.59*|284.22|7.52*|33.79 — group is more homogenous than the gen- 
Pi & |142.69]2.40 |181.06|3.05*|59.43 eral population. Such increased homo- 
clerical gencity was found for the professional 
Managerial |207 .03/3.77*|206 .58)3.67*|56.28 group on Verbal-meaning and for the 
Prgfen- 72.45)1.16 |163.81/2.63*|62.36 managerial group on Word-fluency. In- 


* Significant at or above the .01 level of confidence. 


spection of the range of standard devia- 
tions among the occupational levels gives 


TABLE 4 
ANALYSIS OF VARIANCE FOR THE DIFFERENCES BETWEEN OCCUPATIONAL LEVELS on BACH 
SEPARATE Primary MENTAL ÅBILITY ADJUSTED FoR THE EFFECT oF 
CORRELATION Dur TO MATCHING ror AcE or Ss 


(N = 80) 
Between occupational Between matched Residual 
levels individuals error 
MS F MS F 
Verbal-meaning 604.45 11.99* 98.91 1.96 45.74 
Space — 108.41 T 112.80 a 106.38 
Reasoning 277.15 4.57* 119.18 2.19 56.16 
Number 318.18 3.94 95.97 ii 80.64 
Word-fluency 500.25 7.45* 100.09 1.49 67.04 


* Significant at or beyond the .01 level of confidence. Trivial F ratios are omitted. 


ability. Verbal-meaning, Reasoning, and 
Word-fluency are found to differ signifi- 
cantly between occupational levels but 
Space and Number apparently fail to 
discriminate. Examination of the appro- 
priate means shows high performance on 
Verbal-meaning and Reasoning for the 
professional group, low performance for 
the skilled laborers, and about equal and 
intermediate performance for the mana- 
gerial and clerical groups. On Word-flu- 
ency the clerical and professional groups 
are about equal and high, while the man- 
agerial and skilled labor groups are low. 


further indications why some of the abil- 
ities fail to discriminate between levels. 


Summary 


Scores on the intermediate form of the 
§.R.A. Primary Mental Abilities test were 
examined for a stratified sample of male 
Ss from four occupational levels to test 
the hypothesis that differential perform- 
ance on this test is useful in predicting 
future occupational placement. Several 
hypotheses frequently used in counseling 
on the basis of the PMA are presented 
and relevant evidence concerning thé 


OCCUPATION AND PRIMARY MENTAL ABILITIES 


PMA patterns of adults who have made 
Permanent occupational choices is given. 
The Tesults of an analysis of variance 
sti significant differences between the 
alert ability for different occupational 
The i and also between different abilities. 
ac interaction between occupational level 

individual mental abilities, however, 


‘ Was not significant. 


EE differences were also found 
i Fan abilities within the “skilled labor” 
in EET groups. Analysis of the 
T r mental abilities showed signif- 
ioe ice between the mean scores 
aln erent occupational groups on Ver- 
Š caning, Reasoning, and Word-flu- 

ney, 
E should be pointed out that the pres- 
a study was concerned only with oc- 
pational levels. Pattern analysis of the 
me might therefore still be helpful for 
On sats success in a specific occupation. 
sre le basis of the present findings, how- 
r it must be concluded that profile 
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elevation (or general intellectual level) 
is of greater importance than profile pat- 
tern in’ predicting vocational choice. 
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Although many investigators (2, 3, 4, 
8, 9, 10) have demonstrated repeatedly 
that the counselor or test examiner may 
either deliberately or inadvertently struc- 
ture the stimulus field, a review of the 
literature indicates that minimal research 
has been done regarding some of the char- 
acteristics of such influence. Only recently 
did Bordin (1) suggest the theoretical 
implications of the ambiguity structured- 
ness variable in the counseling process. 

It was the purpose of this study to in- 
vestigate temporal effects and the verbal 
responses per se resulting from structuring 
the instructions regarding what would be 
appropriate responses using the free as- 
sociation technique. According to Bordin, 
if a group of “minimally anxious” Ss were 
used, it could be deduced from his theo- 
retical approach that the suggestions of 
appropriate responses to some words for 
these subjects would not influence the 
Tesponses made to subsequent words. This 
study attempts to investigate this hy- 
pothesis. 

A secondary purpose was to determine 
if there were regional differences occurring 
in the free associating technique. 


PROCEDURE 


The Ss were 401 college students at- 
tending the University of Mississippi dur- 
ing 1955-56 school year. Of these, 120 
were enrolled in sophomore and junior 
year education courses, 130 in sophomore 
year psychology courses, and the remain- 
der in economics, engineering and under- 
graduate courses in statisties. No sex 
differentiation was made. The Ss were di- 


vided into three groups according to in- 


structions given the students on the Kent- | 


Rosanoff Free Association Test (6). AD 


attempt was made to have equal represen- | 


tation of students from the various type 
classes in each of the three groups. 

Group I. The instructions given by 
Russell and Jenkins (7) for administering 
the Kent-Rosanoff Free Association Test 
were used and are as follows with the ex- 
ception that Mississippi was substituted 
for Minnesota: 


This is one of the studies in verbal be- 
havior being done at Mississippi. This par- 
ticular experiment is on free association. | 

Please write your name on the outside of 
the paper passed to you. You can ignore the 
place for your name on the other side. r 

When you open these sheets, you will 
sce a list of 100 stimulus words. After each 
word write the first word that it makes you 
think of. Start with the first word; look 
at it; write the word it makes you think of; 
then go on to the next word. 

Use only a single word for each response. 

Do not skip any words. | 

Work rapidly until you have finished | 
all 100 words. 


When you are through, turn your papel | 


over and write on the back the letter that 
appears on the board at that time. 


Are there any questions? 
Ready. Go. 


The following additional section aP“ 
peared after their third paragraph: | 
For Example, Your Responses to the First 
Words Might be as Follows: 

No. 


4 


Stimulus Response 
1 Table Chair 
2 Dark Light ; 
3 Music Song 
4 Sickness Health 
5 Man Woman 
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_ Spective groups appeared 0 
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These example response words were those 
listed by Russell and Jenkins as the most 
common responses to the respective stimu- 
lus words. This group consisted of 133 
students and will be referred to hereafter 
as the positively structured group. 

Group II. The instructions to this 
group were the same as those for Group I 
except that the given “response word” 
examples had a frequency of 10 in 1031 
samples as listed by Russell and Jenkins; 
therefore, these were considered uncom- 
mon or atypical responses to the respec- 
tive stimulus words. 

These response words were: 


No. Stimulus Response 
1 Table Eat 
2 Dark White 
3 Music Dance 
4 Sickness Bad 
5 Man Mouse 


This group consisted of 133 students and 
will be referred to hereafter as the nega- 
tively structured group. 

Group III. The instructions given to 
this group were identical to those given 
by Russell and Jenkins; i.e., no response 
example was given. This group served as 
the control group and consisted of 135 
students. 


The above instruction for the three re- 


n the first page 


of a three-page mimeographed test book- 
let. The second and third pages followed 
the form given by Russell and Jenkins 


In the present study the same procedure 
as that described by Russell and Jenkins 
Was followed (7). After the students had 

een working on the test for four minutes, 
the letter A was printed on the black- 


board, Every 30 seconds thereafter a new 


letter was substituted in alphabetical se- 
had completed 


quence. When the students 

the test, they recorded on the back of 
the test booklet the letter appearing on 
the board at that time. Consequently a 
Tough index of time necessary for each 
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student to complete the test could then 
be obtained. 

Following the administration of the 
tests, response frequencies were tabulated 
for each stimulus word for each of the 
three groups. The frequencies were then 
converted into percentages and tests of 
significances were made according to the 
procedure suggested by Lawshe and Baker 
(5). 

A difference in percentage ratio was 
used to investigate the influence of the 
suggested answers. The formula used was 
(E, — C)/C, in which E, was the per- 
centage of responses in the experimental 
group and C was the percentage of re- 
sponses in the control group. The value 
of this ratio lies in offering a convenient 
way of showing how the values of the ex- 
perimental groups converged with that of 
the control group. 

Throughout this article, references to 
the expression, “most common response 
word,” pertain to the word that was listed 
by Russell and Jenkins as the word having 
the highest response frequency for each 
of the 100 stimulus words. Unless other- 
wise indicated, all statistical analyses in 
this article are based upon the frequencies 
of these 100 most common response words. 
No attempt was made to evaluate dif- 
ferences of other response words. 


RESULTS AND DISCUSSION 


Temporal effects. An analysis of vari- 
ance was computed for the length of time 
required by the three groups to complete 
the tests. The F ratio was found to be 
11.14 which was significant beyond the 
one per cent level. Consequently, these 
data were then examined for t values and 
the means and standard deviations are 
given in Table 1. 

It was revealed that Groups I and II 
did not vary significantly from each other, 
but the differences were significant at the 
one per cent level between Groups I and 
III and between Groups II and III in 
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TABLE 1 responses given to first 10 words. After all 
Time IN SECONDS REQUIRED response words to the respective stimulus 
To ComrLeTE TEST : words had been tabulated, the frequencies 
i der 
M SD were converted into percentages. In or 
= x = to examine the influence of suggestion, 2 
I 133 536 139 difference in percentage ratio was used 
II 133 511 121 and the data for the first 10 stimulus 
III 135 473 121 


words are given in Table 2. 
A. 2 a 


TABLE 2 


PercentTace or Ss IN Eacu Group Givinc Most Common Response Worp to EACH 
OF THE First 10 SrıMuULUS WORDS AND THE DIFFERENCE Ratio BETWEEN 
University OF Mississippi GROUPS 


Percentage of group responses Difference ratio 


Stimulus Response Te > Ie Minn, I- m-i 
uI m 

a E N 
1. Table Chair 94 28 8I 84 6  —.65 
2. Dark Light s7 50 71 8&8 .23 —.30 
3. Music Song/s 58 ll 15 18 2.868 —.27 
4. Sickness Health 67 24 30 38 1.23 —.20 
5. Man Woman/en 92 66 81 77 14 —.19 
6. Deep Shallow 60 47 44 32 -36 -07 
7. Soft Hard 64 56 55 45 .16 -02 
8. Eating Food 34 24 35 39 —.03 —.31 
9. Mountain Hill/s 44 44 40 27 .10 -10 
10. House Home 33 29 23 25 .18 -04 


® N for group I = 133. 

b N for group II = 133. 

© N for group III = 135, 

d Percentages are based on Russell and Jenkins data. 


time required to complete the tests. These 
data indicate that with the type of sug- 
gestion given to Groups I and II on a free 
association test, whether it be a common 
or uncommon answer, the time required 
to respond will be increased. It is to be 
noted that the most common or normal 
suggestions to a “normal” population re- 
sulted in the longest response time. The 
authors have no suggestion as to why 
this behavior occurred but are now in the 
process of attempting to duplicate this 
behavior with a different population and 
testing the hypothesis that differences in 
response time will disappear when certain 
variables are controlled. 

Detailed examination of most common 


It should be recalled that differences 
are based upon but one word in each re- 
sponse set; that is, the most common re- 
sponse word as given by Russell and 
Jenkins. These data in Table 2 revealed 
that by the time the students reached 
the sixth stimulus word, for all practical 
purposes, the influence of the suggested 
words had been dissipated. This trend was 
more consistent and stable in the nega- 
tively structured group (Group II) than 
in the positively structured group (Group 
I). The difference in response frequency for 
the most common response words betwee? 
Groups I and II and between Groups I 
and IIT was significant at the one per cent 
level for the first four stimulus words- 
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Also, there was found a significant dif- 
ference at the one per cent level between 
Groups II and III for responses given to 
stimulus words [1] Table and [2] Dark. 
A possible explanation of the lack of sig- 
nificance between Groups II and III in 
response frequency to stimulus words [3] 
Musie and [4] Sickness is that the habit 
Strength for the most common response 
might be considered relatively weak. Ac- 
cording to Russell and Jenkins the most 
common response for the stimulus word 
Music occurs with a frequency of 18 per 
cent and the frequency for Sickness is 38 
Per cent (7). The fact that percentage 
differences are significant between Groups 
I and II and Groups I and III for these 
Same words may be a function of the sug- 
Sestions given in the instructions and/or 
the Jack of a strong competing response 
abit strength. The authors are now in- 
Vestigating this possibility by ranking the 
Words in the Kent-Rosanoft word list ac- 
cording to the response strength of each 
Stimulus word and repeating this study. 
i The explanation for the significant dif- 
erence in percentage response for the 
Stimulus word [5] Man between Groups I 
and II and the absence of a significant dif- 
erence between Groups I and III and 
roups II and III is more difficult. It 
May be that negative suggestion dissipates 
Ply rapidly than positive suggestion and 
his could have been operating to pro- 
i such an effect. More likely is the 
‘ct that rate of dissipation is confounded 
with the problem of unequal response 
abit strengths among first five words. 
Examination of the remaining 90 most 
common response words. AS indicated 
‘Dove, the instructions did not influence 
i response beyond the fifth word. The 
Y significant differences that were 
Peg between any two of the three groups 
[oar for the stimulus words [20] Chair, 
23] Woman, [59] Health, and [88] Heavy. 
though these differences were significant 
the one per cent level, an & priori ex- 
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planation would be that they have oc- 
curred by chance and were due neither 
to the instructions nor to the samples that 
were used. 

Regional differences. Regional differ- 
ences were examined by comparing the 
response frequencies of college students 
at the University of Minnesota with the 
response frequencies made by the control 
group students at the University of Mis- 
sissippi. Only one response word was 
found to be significantly different: i.e., 
for stimulus [24] Cold, at the University 
of Minnesota, 34 per cent gave the re- 
sponse Hot, whereas 60 per cent of the 
college students at the University of Mis- 
sissippi gave that response. 


SUMMARY 


Four hundred and one undergraduate 
college students were divided into three 
groups for administration of the Kent- 
Rosanoff Word Association Test, under 
the following conditions: The first group 
of students was given five examples of 
common responses to the stimulus words; 
a second group of students was given five 
examples of uncommon responses or Te- 
sponses occurring approximately one per 
cent of the time; and a third group of 
students was given no example. The re- 
sulting response frequencies were com- 
pared. There was no apparent difference 
in responses after the sixth word among 
those students who were given common or 
“normal” response examples, those given 
atypical responses, and those given no 
example of responses to the stimulus 
words. No differences were found be- 
tween responses given by college students 
at the University of Minnesota and col- 
lege students at the University of Mis- 
sissippi. This research suggests that the 
influence of any response instructions 
given to a normal population on a free 
freas Ean -test-will.he rapidly dissi- 

Aaa ea et Edny, =» i 
j Ens 
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q The extent of public support for scien- 
tific research and education is dependent 
Upon the attitudes toward science and 
Scientists which prevail in the culture. The 
development of these attitudes begins 
early in the elementary school (2). These 
attitudes are solidified by the time stu- 
dents reach the secondary school level 
(6, 8) where they influence the choice of 
a future career (7). The attitude of the 
Public toward the current Man-Into-Space 
Program is at least partially influenced by 
a general attitude of both respect for and 
3 fear of the influence of scientific ad- 
e upon our society. Stated somewhat 
erently, this ambivalent feeling toward 
AR research scientist means that he is 
pera a “different” and perhaps 
ie htly dangerous individual, but also a 
P cessary and even useful member of our 
Ociety. These attitudes would appear to 
© particularly significant as far as ele- 
eeny and secondary school teachers are 
oe since they are in daily contact 
2 potential future scientists during the 
when these attitudes are develop- 
An indirect and somewhat disguised ap- 
Proach to Ss’ attitudes toward scientists 
Bn be through a study of the consistency 
ton 2 which Ss attribute a syndrome of per- 
B ality characteristics to the average 
lentist. It is clear that stereotypes of the 
personality traits of people in various 0c- 
Pations do exist among college students 
aie and identification of the specific 
are s that Ss believe distinguish the sci- 
tist from people in other occupations 
ia Provide an insight into their atti- 
‘i €s toward the scientist and permit the 
ee development of a relatively 
Ple and objective assessment device 


to measure these attitudes. Terman’s (10) 
investigation of intellectual and interest 
differences among four occupational 
groups, scientists, engineers, lawyers, and 
businessmen, suggests that comparisons 
among the personality traits attributed 
to these occupations might provide evi- 
dence as to the students’ stereotype of the 
personality of the research scientist. 


PROCEDURE 


Traits. An original list of approximately 
100 personality trait names was compiled 
from several published sources (1, 8, 9). 
Subsequently a list of 60 traits was selected 
on the basis of two criteria: (a) 30 traits 
that appeared on an a priori basis to be 
socially desirable and 30 traits judged to 
be socially undesirable, and (b) traits 
were selected that appeared representative 
of many significant dimensions of behavior 
including work habits, intellectual char- 
acteristics, and both the social and non- 
social aspects of personality. These cri- 
teria were employed (a) to minimize 
response bias in the subsequent ratings and 
to include a wide range of social desira- 
bility in the selected personality charac- 
teristics, and (b) to insure as far as pos- 
sible an adequate sampling of many 
different areas of behavior. The trait names 
finally selected were: accurate, calm, 
clumsy, fearful, considerate, meddlesome, 
intellectual, economical, democratic, in- 
ept, egotistical, cruel, logical, mature, un- 
systematic, pessimistic, friendly, sarcastic, 
studious, alert, kind, disorganized, timid, 
critical, orderly, responsible, incompetent 
impulsive, tactful, annoying, precise, sin- 
cere, humorous, unimaginative, reckless, 
irritable, persistent, stable, sloppy, nerv- 
ous, sympathetic, shy, thorough, self- 
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confident, generous, unproductive, miserly, 
fault-finding, industrious, dependable, in- 
efficient, moody, tolerant, argumertative, 
capable, unreliable, rigid, poised, lonely, 
charming. , 

Forms. Four occupational titles were 
selected (research scientist, engineer, law- 
yer, businessman), similar to the com- 
parison groups used by Terman (10), 
and six forms were prepared, one form for 
each of the possible combinations of two 
of the four occupations. On each form the 
S was requested to compare the average 
person in one (rated) occupation with 
the average person in another (reference) 
occupation and to make a judgment as to 
whether the first person would be most 
likely to have more, less, or an equal 
amount of each of the 60 traits than the 
person in the second (reference) occupa- 
tion. For example, the significant portions 
of the instructions for Form C were: 


We all know that a person’s interests, 
abilities, attitudes, and personality charac- 
teristics determine to a large extent what 
occupation he or she will select. For ex- 
ample, the average research scientist has 
more or less of certain traits than does the 
average businessman, although on other 
traits these two people will have the same 
amount of these particular traits. We are 
asking you to identify which of these traits 


distinguish the research scientist from the 


businessman and which traits they have in 
common, 

Below is a list of 60 traits to be identi- 
fied.... Please indicate on your answer 
sheet your judgment for each of the traits 
using the following marking system: 

Column A: the average research scientist 
has more of this trait than the average 
businessman. 

Column B: both the average research 
scientist and the average businessman 
have about the same amount of this 
trait. 

Column C: the average research scientist 


has less of this trait than the average 
businessman. 


The combinations of occupations used on 
the forms are as follows: (a) Research 
Scientist vs. Engineer; (b) Research Sci- 
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entist vs. Lawyer; (c) Research Scientist 
vs. Businessman; (d) Engineer vs. Law- 
yer; (e) Engineer vs. Businessman, and 
(f) Lawyer vs. Businessman. 

A seventh form (Form G) was con- 
structed that requested the Ss to rate 
each of the 60 trait names on a five-point 
scale of social desirability. No reference 
was made on this form to specific occupa- 
tions, but the Ss were told that we were 
trying to obtain relative measures of the 
social desirability of a large number of 
personality traits. This last form was used 
as a check on our original dichotomization 
of the traits into socially desirable and 
undesirable groups. 

Subjects. Form G was administered to 
54 Ss (18 men and 36 women) enrolled in 
two sections of introductory educational 
psychology. Forms A through F were ran- 
domly distributed to 154 Ss in four other 
sections of the same course, each S Te- 
ceiving only one form. Sixteen Ss were 
discarded from this second group to in- 
sure that equal numbers of men and 
women Ss responded to each form. The 
discarding of Ss from each form-sex sub- 
group was random and the final group 
consisted of 138 Ss with 23 Ss (8 me? 
and 15 women) recording their judgments 
on each of the six forms. The Ss wer? 
sophomore pre-education students wh? 
are required to take this course prior tO 
admission to the School of Education. 


Resvuurs 


The mean social desirability rating of 
each trait by the 54 Ss who received For 
G was computed and no overlap was foun 
in mean ratings between the 30 traits that 
had been a priori selected as socially de 
sirable and the 30 traits selected as socially 
undesirable. Consequently, the origin? 
grouping of the items into these two class¢* 
was retained in subsequent analyses. 

The answer sheets of the 138 Ss wh? 
responded to Forms A through F p 
vided four separate scores: the numb? 
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TABLE 1 


ANALYSES OF VARIANCE oF Four PersoxaLrTyY Trait Scores DISTINGUISHING 
BETWEEN COMBINATIONS OF OccUPATIONS 


gl Desirable Undesirable- | Undesirable- 
Sonce More ess More Less 
Variation df 
Mean F Mean Mean F Mean 
Square Square Square Square F 
z (Sx) 1 5.22| .29 | 38.74| 2.77 | 32.36 | 2.13 | 36.27 | 2.05 
n (I) 5 147.60| 8.19* | 138.06] 9.87* | 18.01 | 1.19 | 11.92 | .67 
Withe 5 16.67} .92 7.61} .54 | 17.46] 1.15 | 24.53 | 1.39 
m 126 18.03 13.99 15.19 16.67 
"P > 001 
TABLE 2 


s _issizable traits the rated occupation 
er ~via! of (Desirable-More), the num- 
mee desirable traits the rated occupation 
a ess of (Desirable-Less), the number 
ad ndesirable traits the rated occupation 
ie of (Undesirable-More), and the 
oy er of undesirable traits the rated oc- 
ange had less of (Undesirable-Less) . 
titel, of these four scores was then sepa- 
an i subjected to a two-criterion (sex 
sults orms) analysis of variance. The re- 
sai an reported in Table 1. Both the 
dls. ble-More and Desirable-Less scores 
Od) inated among the six forms at the 
= level of confidence, but neither the 
ag Table:More nor the Undesirable- 
cant Scores gaye any evidence of signifi- 
Statist renee among the forms. No 
eren ically significant (.05 level) sex dif- 
Š ia were found in any of the analy- 
discri Pparently the Ss could consistently 
minate differences among the pairs 
Occupations with respect to the presence 
the sence of desirable traits shown by 
ions “erage individual in the four occupa- 
i ut did not discriminate among the 
geo as to undesirable personality 
Cernin This suggests that stereotypes con- 
the © occupational personalities involve 
Siab eiee or absence of socially de- 
e traits only. 

tributer a number of desirable traits at- 
oü to each pair of occupations can 

nd in Table 2. The difference score 


MEAN NUMBERS OF DESIRABLE TRAITS 
ATTRIBUTED TO Four OCCUPATIONS IN 
Sıx ComBrnations (n = 23, N = 138) 


Desirable Traits 
Occupation Reference 
Rated Occupation Dii- 
E More | Less | fer- 
A ence 
A | Scientist | Engineer | 9.7] 5.7| 4.0 
B | Scientist | Lawyer 6.7| 7.3|—0.6 
C | Scientist | Business- | 10.0] 6.8) 3.2 
man 
D | Engineer | Lawyer 5.8} 10.7) 4.9 
E | Engineer | Business- | 7.0 6.9} 0.1 
man 
F | Lawyer | Business- | 12.4] 3.1) 9.3 
man 


between the “more” and “less” means can 
be viewed, in a sense, as a “favorability” 
score for the rated occupation when com- 
pared with the reference occupation. Both 
the sum of the “more” and “less” scores 
and the difference between these two 
scores were subjected to two-criterion (sex 
and forms) analyses of variance. The sums 
did not discriminate between the forms 
(F = 1.23) while the difference score 
showed form differences that were signifi- 
cant at the .001 level (F = 14.75). No 
sex differences were found in either analy- 
sis. Duncan’s method was used to test the 
significance of the differences among the 
difference score means (4, 5, pp. 26-29) 
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TABLE 3 
SINGLE PERSONALITY Traits Most CONSISTENTLY ATTRIBUTED 
to Sıx CuMBINATIONS OF OCCUPATIONS 
The First Occupation Is n The First Occupation Is 
Occupations prapa 
Compared More Less More Less 
è š 5 ised 
h Sci- | Persistent | Economical i Precise Poised 
Tenete vs. En- | Studious Engineer vs. Oharmitg 
gineer Intellectual Lawyer Self-confi- 
Alert dent 
Thorough Tactful 
Alert 
h Sci- | Precise Charming Precise Tactful 
E vs. Poised Engineer vs. | Accurate Humorous 
Lawyer Humorous Business- Studious Poised 
Self-confi- man Thorough 
dent 
Friendly 
Research Sci- | Thorough | Charming Studious Economical 
entist vs. | Studious Tactful Lawyer vs. | Intellectual 
Businessman | Precise Friendly Business- Poised 
Intellectual | Humorous man Precise 
Orderly Economical Thorough 
Persistent Logical 
Accurate Persistent 
Logical Tactful 
Tolerant 


and it was found that the difference score 
means fell into four groups. The difference 
mean of Form D was significantly (.05 
level) larger than the difference means of 
Forms A and B. Forms A and B were 
significantly different from Forms C and 
E, and the mean difference score of Form 
F was significantly lower than the mean 
difference scores of Forms C and E. The 
differences in means between Forms A 
and B and also between Forms C and E 
were not significant. The research sci- 
entist and the lawyer were quite similar 
in difference (“favorability”) scores as 
were the engineer and businessman. How- 
ever, the scientist-lawyer pair of occupa- 
tions were quite distinct from the en- 
gineer-businessman in mean difference 
scores. 

Analyses were performed for the 30 


desirable traits on each form to identify 
the single traits that distinguished betwee? 
each pair of occupations. The “more” and 
“less” percentages (NV = 23) were com- 
puted for each trait on each form and 1 
either percentage was greater than 57 Pe 
cent the trait was selected as being ê 
discriminating item. The results of thes? 
individual trait analyses can be found 1? 
Table 3. It seems apparent that stereo- 
types exist for all four occupations. THe 
scientist and lawyer appear similar ip 
the more intellectual traits, but the sc- 
entist lacks the warm social graces tha 
characterize the lawyer stereotype. The 
engineer is a junior edition of the scientist 
while the businessman lacks the intellec- 
tual qualities of the lawyer, but shares 
many of his social traits. 

Since our basic interest was in th? 


eee 
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Stereotype of the research scientist, the 
traits that most consistently discriminated 
the Scientist from the other three occupa- 
tions are given in Table 4. Again the 
Same stereotype pattern appears: the sci- 
entist, along with the lawyer, has more of 
the Socially desirable intellectual and work 
mgd traits than does the engineer and 
ths usinessman, while the scientist, like 
the engineer, has less of the social graces 
The does the lawyer and businessman. 
; 12 traits listed in Table 4 appear to 
onstitute the core stereotype that the 
S had regarding the research scientist. 


DISCUSSION 


T evidence that our college Ss con- 
Samed attribute certain personality 
entist to the occupations of research sci- 
confir, engineer, lawyer, and businessman 
rms the results of other studies (1, 
a different occupations and dif- 
ing 4 methodologies were used. The find- 
s a most general interest was that the 
only ee among the occupations 
or got Socially desirable traits and not 
this Socially undesirable traits. Whether 
Ss e reluctance on the part of the 
of th Say that one occupation has more 
in ne undesirable traits, or hesitation 
ce relative freedom from un- 
cannot ility traits to the paired occupation, 
imni; be determined from our data. The 
E for research where the S is 
PAY to attribute personality traits to 
vious, and also to other people are ob- 
oa analysis of individual traits com- 
o ae the stereotypes of the personalities 
the an four occupations suggests that 
ty, C°uPations were discriminated along 
e E eayay independent dimensions. 
tiim S San. be grouped into (a) those 
ese to intellectual and work habits 
eristies and (b) those related to 
Personality traits that arise pri- 
in interpersonal relations. Research 
ts are viewed as being high (having 
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TABLE 4 
PERCENTAGES OF SUBJECTS SAYING THAT 
CERTAIN SOCIALLY DESIRABLE TRAITS 
DISTINGUISH THE RESEARCH 
SCIENTIST FROM MEN IN 
OTHER OCCUPATIONS 


Than the Average 


Research Scientist 
is More 


Busness Engineer | Lawyer 
Intellectual 70 70 26 
Logical 61 52 26 
Orderly 70 | 48 48 
Persistent 70 w OL 30 
Precise 87 39 78 
Studious 91 78 43 
Thorough 96 65 39 
Than the Average 
Research Scientist is |_—_——————|{|_ 
Se B usin Engineer | Lawyer 
Charming 70 34 83 
Friendly 65 39 57 
Humorous 61 39 70 
Poised 52 30 74 
Self-confident 35 48 61 


more of the traits) on the intellectual di- 
mension and as being low (having fewer 
of the traits) on the social axis. Engineers 
are not as intellectual as scientists, but 
members of both professions are equally 
lacking in social graces. The lawyer is 
equally high on both axes, while the busi- 
nessman is low on the intellectual dimen- 
sion and high on social traits. Whether or 
not other occupations can be located 
within this two-dimensional system, or 
whether other dimensions would have to 
be added are questions for further study. 

The 12 traits listed in Table 4 appear 
to offer a possibility of developing a short 
objective scale for measuring the extent of 
the stereotype of the research scientist held 
by individual Ss. The same procedure used 
in this study could be repeated by ad- 
ministering Forms A, B, or C to Ss and 
scoring each S as to how many of the 
first seven traits the S says the scientist 
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has more of and how many of the last 
five traits he indicates the scientist has 
less of. Such a short 12-item scale may 
lack sufficient reliability for research pur- 
poses, but, if reliable, would offer a method 
of quantifying this aspect of attitudes for 
further research. 

It is particularly interesting to note 
that the Ss displaying this stereotype of 
the research scientist were pre-education 
students who, in a few years, will be 
teaching children and adolescents from 
whom the next generation of scientists 
must be recruited. If this stereotype con- 
tinues to be transmitted from teacher to 
student, the problem of interesting high 
school students in scientific careers will 
remain with us. 


SUMMARY 


Pre-education college Ss (V = 138) 
were asked to compare two of four oc- 
cupations (research scientist, engineer, 
lawyer, businessman) as to whether the 
average members of the paired occupations 
would have more, less, or an equal amount 
of each of 60 personality traits. Equal 
numbers of Ss (N = 23) responded to 
each of the six possible pairs of occupa- 
tions. The traits were evenly dichotomized 
into socially desirable and socially un- 
desirable groups on the basis of an a 
priori selection which was validated by 
having another group of Ss (N = 54) rate 
the traits for social desirability. The num- 
ber of socially desirable traits attributed 
to each occupation discriminated among 
the occupations (.001 level), but the 
socially undesirable traits did not. Analy- 
ses of the 30-individual socially desirable 
traits indicated that the Ss viewed the 
scientist and lawyer as having more of 
the intellectual traits while the lawyer and 


A. W. BENDIG AND PETER T. HOUNTRAS 


the businessman were perceived as having 
more of the desirable interpersonal traits. 
Both the scientist and engineer have less 
of the interpersonal traits and the busi- 
nessman has few of the intellectual traits. 
The most consistent stereotype in this 
study regarded the research scientist, when 
compared with the other three occupa- 
tions, as being more intellectual, logical, 
orderly, persistent, precise, studious, 
thorough, and also as being less charming, 
friendly, humorous, poised, and self- 
confident. 
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tape people are necessarily concerned 
dey, language habits and attitudes that 
ee op in the primary grades. When these 
a nings involve spelling they may influ- 
ce a child’s achievement in written lan- 
ig not only in the beginning grades but 
i ar later years. In his first years 
| oe ool the child utilizes language skills 
t he in preschool years and may, 
È to en, rely heavily on auditory clues 
enc rds he must spell. There is some evi- 
nm that auditory abilities are more im- 
ge for spelling in the lower grades 
Teach they are by the time the child 
x lit the seventh grade level of spelling 
toista (11). Evidence of the close rela- 
abiliti between auditory and spelling 
Utent in the primary grades has been 
ord sted by such investigators as Brad- 
& te (1) and Russell (10). Bradford used 
Mays paper-and-pencil test of indi- 
bleng vowel and consonant sounds and 
oun | of ‘regularly spelled” words and 
fi considerable growth between the 
and second grades. Russell found 
S'ations between spelling and auditory 
Secon, ranging from the .20’s to .80’s in a 
Vesti grade group. Typical of the in- 
(9) ai in the area is one by Rudisill 
Veen ea found a correlation of .69 be- 
X gro Spelling and phonic knowledge for 
Nort Up of 315 third grade children in 
fects Carolina. In another study of the 
Stade o Phonic training at the second- 
nt evel, Zedler (13) found improve- 
Spe te 14 hours of instruction, 1n both 
Nation Cores and speech-sound diserimi- 
On abilities. 
targ tgh spelling has usually been re- 


D d as one of the simpler skills ac- 


Bar 
assista ea J. King and Gerald M. Meredith 
in collecting and processing the data. 


Corre 
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quired in school, one with a heavy loading 
of associative learning, Horn (7) and 
others have shown just how intricate and 
complex the relationship between sounds 
and letters may be. The apparently plain 
injunction to combine phonetic analysis 
in spelling and reading activities is not 
so simple nor so direct as it seems. Horn, 
for example, has listed six types of evidence 
which must be considered in relating audi- 
tory characteristics of words to their spell- 
ing and has presented facts about three 
of them: (a) the variation in pronouncia- 
tions of the “same” words, (b) the dif- 
ferent ways in which the various English 
sounds are spelled, and (c) the ways chil- 
dren actually spell sounds in common 
words. He concludes, for example, that 
there is little justification for the claim 
that children can spell the words they can 
pronounce and therefore believes that di- 
rect teaching of the large number of ir- 
regularly phonetic words is inevitable. 

Since many words must be taught di- 
rectly in the lower grades, the question 
every teacher faces is that of how to get 
children to study the words efficiently. 
Shall this child be encouraged to rely on 
visual techniques? Does that child do 
better with auditory techniques, and if 
so, which ones should he use? The present 
study is concerned with identifying audi- 
tory techniques which a child is most 
likely to find useful at the primary-grade 
level. 


PROCEDURES 


To explore further the relationships be- 
tween auditory abilities and spelling, 97 
children in the first three grades of an 
Oakland, California, school were tested. 
The numbers used were Grade I, 30; 
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Grade II, 32; and Grade III, 35. Since 
some of the tests were not useful for 
children reading at the pre-primer level, 
the results in the study are obtained largely 
from a sample of 85 children with some- 
what less than a third of the group in 
first grade. The children came largely from 
middle or lower-middle class homes. In 
the three grades the range in CA was from 
6-8 to 10-2, in MA from 6-7 to 10-2, and 
in IQ from 83 to 119. 

The following standardized tests were 
administered: 


1. The Kuhlmann-Anderson Intelligence 
Tests 

2. The California Achievement Test— 
Spelling 

3. The Durrell-Sullivan Reading Capacity 
Test 

4. The Gates Primary (and Advanced 
Primary) Reading Tests, Types I, II and 
Ii. 


In addition, six tests of auditory dis- 
crimination were given to the children. 
Since not all of these have been published 
they are described briefly, with examples. 
The group tests were: 


ae Cafirey-Russell Auditory Discrimina- 
tion Test I Same-Different. The teacher 
reads pairs of words such as “shown-sewn,” 
“style-style” and “mobbed-mopped” and the 
child marks whether they are the “Same” 
or “Different.” (This test proved to be too 
easy for the group.) 

2. Caffrey-Russell Audito: Discrimina- 
tion III telling whether wands are different 
in initial, middle, or final sounds, The chil- 
dren mark 1, 2, or 3 (corresponding to initial 
middle, final) on an answer sheet for such 
pairs as “butter-buzzer,” “pits-pitch,” and 
“shoed-chewed.” 

3. Durrell Test of Hearin Sounds i 
Words. This test (2) consists ‘of three sib 
tests: (a) marking the printed word which 
has an initial sound the same as a word 
given orally, Example: The teacher says 
“top” and the children mark one letter of 
P, b, t, n, a. (b) Marking the printed word 
which has the same final or beginning sound 
as a word given orally. Example: The 
teacher says “happen” and the child marks 
one of hexameter, generation, and hydrogen. 
(c) The pupil draws a circle around all 
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phonetic elements (such as letters, blends, 
phonograms) heard in a given word. For ' 
example the teacher says “blinding” and 
the child marks the following: ind r bl x t 
ing. : 
4. Durrell-Sullivan Reading Capacity 
Test. This is a test of comprehension of 
paragraphs read orally which might be called 
a listening or auditory test rather than & 
reading test. The eight paragraphs used were 
modified slightly from the Durrell-Sullivan 
Capacity Test so that raw scores were used 
in computation., Each paragraph was fol- 
lowed by five oral questions in which the 
child marked one of three possible answers. 


Two tests were given individually. These 
were: 


5. The Gates test of Giving Words with 
Stated Initial Sounds described in The Im- 
provement of Reading (3) in which the S i8 
asked to name three words which begin like 


each of three words suggested by the es- 
aminer. 


6. Gates test of Giving Words with Stated 
Final Sounds which is described in the same 
source. The child is asked to say three words 
which rhyme with each of three words sus” 
gested by the examiner. 


Resuurs 


Table 1 gives the zero-order correlatio?® 
for the various tests with spelling scores: 
The table indicates that the reading test® 
as a group correlate more highly wt 
spelling than the individual auditory be 
but that the combined group or battery ° 
auditory tests correlate with spelling f 
highly as the Gates reading tests. The tabi 
further suggests that, for this group, © k- 
best test of auditory abilities in relation * : 
spelling is the Durrell test, composed ° 
three subtests. Chronological age and ne 
tal age do not seem closely related to SP? 
ing ability. In general, the results sugee | 
that rather complex auditory abilities 1 
volving sound recognition.in various P® I 
of a word are more closely related to SP? f 
ing ability than is recognition of sounds 
whole words as in same-different or TD: 
ing tests. The close relationship of fi 
Gates reading tests, especially in we ; 
recognition, to spelling tends to conf” | 
an earlier finding (10). In additio? 


þe 
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TABLE 1 
CorrELATIoNS or Various FACTORS WITH 
SPELLING ABILITY FOR 85 CHILDREN IN 
Grapes I, II, anp III 


Zero-order 
Variable r With 
Spelling* 
General 
l. KuhImann-Anderson Men- 31 
tal Age 
2. Chronological Age -17 
eading 
3. Gates -63 
Type I 
4 Gates -57 
"ype III 
5. Gatos .65 
otal 
Auditory i 
> Caffrey-Russell I -22 
n Caffrey-Russell III -51 
$ Durrell Sounds -86 
w Gates Initial Sounds -29 
w Gates Rhyming 22 
istening Comprehension -33 
12 (Durrell-Sullivan) 
. Salton Total (Items 6 to -66 


“o Ee 
rae Or n = 85, r = .22 significant at 5% level, r = .28 
a Cant at 1% level. 
n= 58. 


ee calculations, the raw scores of the 
Social tests were converted to standard 
about » but the correlations computed gave 
Score the same correlations as the raw 
es, 
ia hie 2 illustrates the use of the coeffi- 
the of multiple correlation to estimate 
relationship to spelling of four com- 
en, auditory tests. The contribution to 
th — may be computed by multiplying 
r po order correlation by its standard 
the Tegression coefficient. In this study 
Colittle Method was used in com- 
escrih, tie standard partial coefficients as 
331) ed in Walker and Lev (12, pp. 326- 
tial tegen”, Computed, the standard par- 
th co cression coefficients are inserted in 
nventional formula for the multiple 
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‘tion, and a measure of contribution 
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of multiple factors to a set of scores may 
be estimated. Table 2 illustrates that fac- 
tors other than auditory abilities account 
for spelling achievement in the group 
tested but that the relationship of auditory 
abilities to spelling ability is a significant 
one. 

Table 3 gives some further information 


TABLE 2 
MULTIPLE CORRELATION AND ConTRIBU- 
TION TO VARIANCE OF SPELLING BY 
SELECTED AUDITORY Tests 


zae ar paves 
i ul ution 
Auditory Test With Corre- |to Var- 

Spelling | lation | iance 
1. Durrell Sounds .66 | .66 |35% 


(1) plus 
2. Caffrey - Russell | .51 | .69 | 13 


Ill 
(1) and (2) plus 
3. Gates Initial | .29 | .71 3 
Sound 
(1) and (2) and (3) 


22 | 372") 2 
52% 


plus 
4. Caffrey-Russell I 


Total Variance Ac- 
counted for 


* Significant at the 1% level of confidence. 


TABLE 3 
InrercorRELATIONS or AUDITORY TxEsTs 


Total 

1 |m}m{iw] yin atte 

Score 

I .27 |.18 |.47 |.36 |.23 | .41 
II -27 .49 |.42 |.25 |.26 | .69 
III |.18 |.49 .42 |.29 |.28 | .79 
IV .47 |.42 |.42 .43 |.35 | .65 
Vv .36 |.25 |.29 |.43 -55 | .48 
vr |23 |.26 |.28 |.35 |.55 “42 


I Caffrey-Russell Auditory Discrim- 
ination Test I 
II Caffrey-Russell Auditory Dis- 
crimination Test III 
III Durrell Sounds in Words Test 
IV Listening Capacity Test (adapted 
from Durrell-Sullivan) 
V Gates Giving Words With Same 
Initial Sound 
VI Gates Giving Words that Rhyme 


Note.—Code: 
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about the interrelationships of the audi- 
tory abilities involved in this study. It 
indicates that the combined score ob- 
tained on the battery of three Durrell tests 
is most closely related to the total auditory 
scores. As mentioned above, the Caffrey- 
Russell I test as administered was too easy 
for this group with a considerable num- 
ber of top scores reducing the size of the 
correlations and the discrimination value 
of the test. It should also be noted that the 
Listening Capacity Test, which was a 
measure of comprehension of verbal ma- 
terials, had a fairly high correlation with 
the other auditory perception tests. 


CONCLUSIONS 


This study of 85 children in the first 
three grades revealed that some auditory 
abilities are significantly related to spell- 
ing abilities at the one per cent level of 
confidence. It began an exploration of spe- 
cific auditory abilities which are most 
closely related to spelling achievement and 
found that these were rather complex abili- 
ties involving word parts rather than whole 
words. A battery of three Durrell tests of 
word sounds and the Caffrey-Russell III 
test which involved recognition of like- 
nesses in initial, middle or final positions 
were closely enough related to spelling 
ability to warrant further study. There is 
considerable evidence that a group of audi- 
tory abilities can be good predictors of 
spelling success in the primary grades but 
the constituents of this group must be 
studied more broadly and more exactly. 
The hypothesis that different children haye 
quite different auditory abilities and there- 
fore should be taught spelling, and possibly 
reading, with different kinds of auditory 
techniques also needs further testing. 

In addition to the role of auditory abili- 
ties in spelling achievement, the investiga- 
tion has confirmed earlier results of the 
close relationship between the Gates tests 
of primary reading and spelling ability at 
this age. On the other hand, the factors of 
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chronological age and mental age within 
this fairly narrow age range are not sig- 
nificant factors in spelling ability. 

The relationship of listening compre- 
hension of oral paragraphs to the auditory 
and spelling tests is a puzzling one. Better 
measures of listening or auding ability are 
needed. If the test used in this study is 4 
valid one it appears that the ability to 
listen to paragraphs with comprehension 
is not closely related to spelling ability 
(r = .33) but that it is fairly closely ïe- 
lated to the combined auditory perception 
or discrimination scores (r = .65). This 
fact, and the close relationship of the spell- 
ing scores to the Gates word recognition 
test indicate the presence of other factors, 
possibly visual discrimination abilities, 
which were not considered in the present 
study. 

The results further indicate the need of 
complete exploration of different kinds of 
phonetic and auditory abilities and their 
relations to spelling achievement. In addi- 
tion to the six measures used in this study 
possible tests include other tests devised 
by Bradford (1), by Durrell (2), PY 
Holmes (6), by Roswell-Chall (8) and 
others. The present study suggests that 
the simpler skills of detecting same-diffet 
ent word pairs or suggesting rhymes are 
not so closely related to spelling achieve 
ment as are more complex auditory abili- 
ties such as identifying sounds in various 
parts of words. Knowing when simila" 
sounding syllables are alike and different 
and knowing the various ways a syllable 
may be spelled once it has been recognize™ 
make the apparently simple process p: 
spelling more complex than it first seems 


SUMMARY 


The relation of scores on the six testé 
of auditory discrimination, sometimes ya 
belled “phonetic skills,” to scores oD 
telligence, spelling, and reading tests mi 
determined for 85 children in the first ae 
grades of a California school. The rest 
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_ indicated that some verbal auditory skills 
are significantly related to both spelling 
and reading ability and that these abilities 
involved recognition of word parts rather 
than whole words. The relationship of 
listening comprehension of paragraphs to 
gees scores was much lower. Considera- 

le contribution to spelling variance was 
pe ented for indicating the possibility 
fe visual discrimination factors may be 
F Portant in spelling or that a wider range 
te Specific kinds of auditory skills should be 

sted probably in relation to both spelling 
and reading. 
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SON OF PRESCHOOL STANFORD-BINET 
TAA AND SCHOOL-AGE WISC IQs? 


FRANCES FUCHS SCHACHTER AND VIRGINIA APGAR 
College of Physicians and Surgeons, Columbia University 


In several studies, the Stanford-Binet, 
Form L, and the Wechsler Intelligence 
Scale for Children (WISC) were adminis- 
tered to the same children, at the same 
age, by the same examiner (2, 5, 7, 9, 
10, 11, 12, 13, 15). The following results 
were obtained: (a) The median correla- 
tion between the Stanford-Binet and the 
WISC Full Scale IQ was .85. (b) Highest 
intertest correlations obtained for the 
WISC Full Scale, next highest for the 
WISC Verbal Scale, and lowest for the 
WISC Performance Seale (2, 5, 7, 10, 1l, 
12, 13). (c) Mean Stanford-Binet IQs 
were significantly higher than mean WISC 
IQs (10, 11, 12, 13). (d) Significantly 
greater intertest discrepancies occurred at 
the high IQ and low age levels (10). 

Previous finding may require modifica- 
tion before application to the situation 
where retesting occurs at different ages 
with different examiners. The present 
study provides data to determine whether 
such modification is necessary. Preschool 
Stanford-Binets are compared with school- 
age WISCs. Comparing these two age 
levels has additional practical interest, be- 
cause the WISC cannot be used before 
age five, so that the Stanford-Binet remains 
the only major preschool intelligence test. 


1 This investigation, part of a larger re- 
search project, was supported by research 
grant (3B-9007) from the National Institute 
of Neurological Diseases and Blindness, of 
the National Institutes of Health, United 
States Public Health Service. 

The authors wish to express their apprecia- 
tion to William Langford for his guidance 
and generosity in providing the facilities 
to make this study possible, and to Arthur 


Carr, and Joseph Zubin for their helpful 
advice. 
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SUBJECTS AND PROCEDURE 


Subjects (Ss) were randomly selected 
from a clinic population born at Sloane 
Hospital for Women, previously described 
by Apgar et al. (1). At preschool age 
(mean = 49.4 months, sigma = 5.9), Ss 
were asked to return to the hospital for 
Stanford-Binets. At school age (mean = 
100.2 months, sigma = 6.3), Ss were asked 
to reappear for WISCs. The average in- 
terval between tests was 50.8 months 
(sigma = 2.2). 

Of 404 Ss selected, 119 returned for both 
tests in response to standard mail requests- 
Six Ss were excluded from the sample. Two 
were not testable, three had possible brain 
damage occurring in the intertest interval, 
and one had an intertest interval exceeding 
the mean interval by five sigmas. The 
final sample numbered 113, 61 males and 
52 females, There were 39 white Ss, 66 Ne- 
groes, 6 Puerto Ricans, and 2 Orientals. 

One psychologist administered all Stan- 
ford-Binets, Form L, at the preschool age 
Another administered all WISCs at schoo! 
age. Stanford-Binet scores were withbel 


from the WISC examiner until testing W35 
completed. 


RESULTS AND Discussion 
Intertest Correlations 


The Stanford-Binet IQs correlated of 
with the WISC Full Seale IQs (p < 1). 
Though significant, the relationship is g 
siderably lower than .85, the intertest at 
relation at the same age, with the ce 
examiner. However, the .67 correlatio” 
compares favorably with previously m 
ported correlations between preschool 2? 
School-age Stanford-Binets, The medis 


FRANCES F. SCHACHTER AND VIRGINIA APGAR 


y Correlation obtained from the latter re- 
Ports was found to be .74 (3, 4, 6, 8). 
The Stanford-Binet IQs correlated .64 
| With the WISC Verbal Scale and 4S with 
the WISC Performance Scale IQs (both 
a < .01), both lower than the correlation 
; r the F ull Scale IQs. This hierarchy of 
°rrelations agrees with previous findings. 


Comparison of Mean IQs 


eae 1 provides the data to compare 
aa oe IQs for the present sample. It 
e seen that the mean Stanford-Binet 
Soy significantly higher than all three 
vith, WISC IQs. This finding too agrees 
Previous reports. 
this owever; in the „present study, with 
ps administering all Stanford- 
ren y and another all WISCs, it can be 
iai that the mean differences reflected 
hie bias rather than intertest differ- 
alter, P Data were available to evaluate this 
a interpretation of the findings. 
vould results reflected examiner bias, one 
and i lowest intertest correlations 
Scorin argest intertest differences where 
Wis oy was most subjective. Since the 
icctiy Verbal Scale entails greater sub- 
ong a Scoring than the Performance Scale, 
ation ould predict a lower intertest corre- 
the ¢ and larger intertest differences for 
ong than the latter. Actually the 
related was found. The Verbal Scale cor- 
Ñ bored better with the Stanford-Binet and 
able a smaller mean intertest difference 
tist 1) than the Performance Scale. 
Nor ” he hypothesis of examiner bias does 
{ppear to be supported by the data. 


X 
ect of IQ, Age, and Sex 


Ses analysis of variance was performed 
the i if the observed difference between 
Seala gn Stanford-Binet and WISC Full 

ree - was related to IQ, age, or sex. 
eated tanford-Binet IQ subgroups were 
lio d, below 90, 90 to 110, and above 
` “Wo age subgroups were formed, be- 
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TABLE 1 
COMPARISON or MEAN STANFORD-BINET 
z anD WISC IQs 


(N = 113) 
WISC 
Measure $B |——— 
FS VS PS 
Mean 104.32/98.94 |100.14 [97.87 
SD 15.96/11.26 | 11.40 |13.07 
t:S-B vs. wisc 4.89*| 3.60*| 4.57* 


* Significant at the .001 level 


TABLE 2 
Errecr or IQ, AGE, AND SEX ON 
Mean Inrerrest DIFFERENCES 


Variable N Mean* F 
S-B IQ 12.83* 
<90 19 —2.68 
90-110 59 +2.39 
>110 35 = +14.86 
WISC Age 1.12 
>s 84 +4.76 
<8 29 +7.24 
Sex 1.50 
Male 61 +3.64 
Female 52 +7.46 


a Minus (—) denotes WISC higher than S-B; plus 


(+) denotes S-B higher than WISC. 
* Significant at the .001 level. 


low eight years and above eight. The num- 
ber in each subgroup is shown in Table 2. 
Since the numbers were unequal, it was 
necessary to use the Walker-Lev (14) ap- 
proximate method of analysis of variance. 

The results shown in Table 2 indicate 
that age and sex had no effect on intertest 
differences, while IQ did. Individual ¢ 
tests revealed that the difference between 
low and average 1Q levels was significant 
at the .05 level (t = 2.26), while the dif- 


2 Since some Ss had higher Stanford-Binet 
IQs relative to their WISC IQs while others 
had higher WISC IQs, intertest differences 
were transformed to a unidirectional scale 
to calculate the analysis. The scale used as- 
sumed that zero intertest difference equals 


30 points. 
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ferences between low and high IQ levels 
(t = 6.59) and average and high IQ levels 
(t = 5.47) were significant at the .C1 level. 
None of the interactions was significant. 
The results indicate that increments in the 
preschool Stanford-Binet IQ increase the 
likelihood that it will be significantly 
higher than the school-age WISC IQ, and 
that the greatest intertest discrepancies 
occur at the high IQ levels. 

The finding of greatest intertest differ- 
ences at the high IQ levels agrees with 
previous research. The data on age appear 
to differ with the previous report (10) of 
a greater intertest discrepancy at low ages. 
However, since the age range of the previ- 
ous study was eight years larger than that 
of the present study, the negative findings 
may be attributed to the relative homo- 
geneity of the sample. There have been no 
previous reports of sex differences in rela- 
tion to differences between the Stanford- 
Binet and the WISC. 


Effect of Race and Nationality 


Since the sample contained both white 
Ss and Negroes, it was possible to study 
the effect of race on intertest differences, 
The results showed that both white Ss 
and Negroes obtained higher Stanford- 
Binet IQs relative to their WISC IQs, 7.74 
and 5.18 mean points, respectively. A ¢ 
value of 1.02 indicated no significant dif- 
ference between the means for both races. 

Though the sample also contained six 
Puerto Ricans and two Orientals, their 
number was too small to permit compari- 
son. However, it was necessary to demon- 
strate that these eight Ss did not signifi- 
cantly affect the results for the remaining 
sample. A comparison of samples includ- 
ing and excluding the eight Ss showed that 
both samples obtained higher Stanford- 
Binet IQs relative to their WISC IQs, 5.60 
and 4.60 mean points, respectively. A ¢ 
value of .70 comparing the means was not 
significant, indicating that the eight Ss 
did not significantly affect the results. 


STANFORD-BINET AND WISC IQS 


SUMMARY : 


Previous investigators have compared 
the Stanford-Binet and WISC IQs of Ss 
retested at the same age by the same ex- 
aminer. The present study attempted to 
ascertain whether previous findings apply 
to the situation where retesting occurs at 
different ages by different examiners. 

Ss were randomly selected from a neo- — 
natal clinic population in New York City. 
One psychologist administered all Stan- 
ford-Binets at preschool age ; another all 
WISCs at school-age. 

Results show that the intertest correla- 
tion is decreased from .85, for retesting at 
the same age by the same examiner, to .67, 
for retesting at different ages with different 
examiners. However, the .67 correlation 
compares favorably with similar retest 
findings for the Stanford-Binet itself. In 
other respects, results support previous 
findings. Highest intertest correlations were 
obtained for the WISC Full Seale IQ, low- 
est for the WISC Performance Scale IQ; 
the mean Stanford-Binct IQ was signifi- 
cantly higher than the mean WISC IQ; 
and greatest intertest discrepancies 0° 
curred at high IQ levels. The results aP“ 
peared comparable for white Ss and Ne 
groes. Further, a small group of eight 
Puerto Rican and Oriental Ss did not aP” 
pear to affect the findings. 
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2 OF 
F TEACHING TIME THROUGH THE USE O 
soe CTURE CLASSIS AND STUDENT ASSISTANTS 


RUTH CHURCHILL AND PAULA JOHN 
Antioch College 


In 1956-7, a member of the mathematics 
department at Antioch College? became in- 
terested in contrasting what students 
learned when they were taught in small 
lecture-discussion sections with a labora- 
tory led by the instructor and what they 
learned when the instructor lectured to a 
large class which was led in small group 
discussions and laboratories by a student 
assistant. He hoped that two kinds of sav- 
ings in teaching time could be made: first, 
each section takes as much teaching time 
and time used in preparation as does a 
single large lecture group. Second, if up- 
perclass students can substitute for in- 
structors in the laboratory, the instructor 
is freed for additional hours by a less 
highly skilled person. In the experimental 

year, teaching by sections took 18 hours a 
week of faculty time; teaching by lecture 
and student-led laboratories took four 
hours a week of faculty time and ten 
hours of a student assistant’s time. 
The specific hypotheses formulated were 
that: 

1. Students would learn skills and un- 
derstandings relevant to the objectives of 
a course in fundamentals of mathematics, 
and this learning would be independent of 
the method of teaching the course, The 
specific methods contrasted were small 
lecture-discussion sections with a labora- 
tory, all led by the instructor, and large 
lecture class, with discussions and labora- 


* Acknowledgements should be made to 
Gustave Rabson, who initiated the experi- 
ment and taught the classes involved; to 
Joan Pomerantz, senior major in government, 
and mathematics, who served as the labora- 
tory assistant; and to Lawrence Balch, 


mathematics major, who served as an essay 
grader, 


tory led by an undergraduate student as- 
sistant. 

2. Student attitudes towards the course 
would also be independent of the method 
of teaching, that is, equally satisfying situ- 
ations could be created under both meth- 
ods. 


Meruops 


The following procedure was set up: In 
one division? the mathematics course was 
taught in two sections, ranging in size from 
20 to 30 students, in the usual manner, 
three meetings a week in which lectures 
by the instructor were combined with 
questions and discussion by the students- 
In addition, there was a weekly laboratory 
session (usually an hour long), also led by 
the instructor. In the other division, the 
instructor lectured twice a week to a class 
of about 70 students. This group had tw? 
laboratory sessions each week, in which 
all discussion, questions, and help wet 
handled by a student assistant. 

Two aspects of learning in the cours? 
were selected for evaluation: backgrou? 
in skills and understanding of the nature 
of mathematics. A 126-item multiple 
choice test was used to measure skills 
Understanding of the nature of math? 
maties, considered especially important » 5 
terms of the objectives of the course, W? 
measured by a short essay. se 

Student attitudes towards the cou" 
Were measured by student ratings of * 


ive 
*At Antioch, because of the cooperativo 
work-study plan, the student body is 10 
divisions, which alternate on campus, vay 
division studying while the other is Oai 
working. Thus, the two groups, or dva 
of students in the experiment were noO 
campus at the same time, 
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+ instructor and by direct questions about 
the course. The student evaluation of the 
instructor employed a rating scale involy- 
Ing five ratings: clarity of presentation, in- 
terest in the student, arousing interest in 
the subject matter, making learning active, 
and knowledge of the subject matter. Stu- 
dent evaluation of the course consisted of 
three open-ended questions: What aspects 
of the course did you like most? What as- 
Peets of the course did you like least? In 
What ways could the course be improved? 

Students took the background test and 
Wrote essays at the beginning and the end 
of the course, which was 20 weeks long. 

he instructor was rated in the fourteenth 
Week of the course, and the course ques- 
tionnaire administered in the last week. 

Unfortunately, the method used for an- 
SWering and scoring the background test 

iq not permit ascertaining its reliability 
easily. Students were instructed not to 
Suess but to mark all answers possibly 
right, The right answers were weighted to 
oal the sum of the possible wrong an- 
in the score was the weighted sum of 
Be ht answers minus one point for each 

Tong answer. The only available data 
2earing on the reliability of the test was 
* correlation of .74 between pre- and post- 
st scores for 18 students in a section of 
Mis course not in the experiment. On the 
‘hole, the test is probably sufficiently re- 
‘able to detect group differences. 

Since there are no objective standards 
ny Measuring understanding of mathe- 
aon the validity of the essay as a meas- 

© of understanding depended on the 
Stading scale evolved and its reliability. 
. © procedure for grading the essays was 
Do Sroup together all the essays, pre- and 
ee from sections and lecture class, 
tas cach of the four topics and to grade 

oth topic separately. All identification 
tö of student and of time was removed 
th m the papers. Two graders were used; 
€Y had available model essays written 
Y the instructor; and they agreed on a 


general definition of understanding of 
mathematics. The correlation between the 
two graders was .67. Since the sum of the 
two grades was used as the final grade, the 
reliability of the essay corrected by the 
Spearman-Brown formula for doubling 
the length is .80. 


RESULTS AND DISCUSSION 


1. The results in Table 1 indicate that, 
in terms of pretest scores on the back- 
ground test and on the essays, students 
taking the course in sections did not differ 
from those taking it in the lecture class. 

2. The data in Table 1 indicate that 
both the sections and the lecture class 
gained significantly from pre- to posttest 
on the background test and on the essays. 
They further indicate that the sections 
did not differ from the lecture class in 
amount of gain. 

3. When the student ratings of the in- 
structor for the two sections are com- 
pared in Table 2 with the ratings for the 
lecture class, two significant differences 
are apparent. The large lecture class rated 
the instructor significantly poorer in clar- 
ity of presentation, and in general they 


TABLE 1 
Pre- anp Posrrest Scores AND GAINS 
MADE BY SECTIONS AND LECTURE CLASS on 
THe BACKGROUND TEST AND Essays 


Sections A | Lecture 
and B class Sign. 
Test of 
Diff. 
Mean | SD |Mean| SD 
Background (N = 47) |(N = 59)| t 
Test 
Pretest 115.6)36.8)113.9}40.3) 0.6 
Posttest 192.8]42.5|187.8]47.9| 0.2 
Gain 77.1136.1| 73.9140.4] 0.3 
lasin 14.50* 13.94* 
Essay Test (N = 41) |(N = 54)| P 
Pretest 17.0| 7.6) 18.6] 6.4] 0.8 
Posttest 22.7| 4.6] 24.9] 7.11 1.6 
Gain 5.71 8.0) 6.3] 7.7] 0.1 
lGain 4.5* 6.0* 


* Significant at the 1% level. 
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TABLE 2 
STUDENT RATINGS OF THE INSTRUCTOR 
Sections A Lecture z 
and B class Sig 
Trait (N = 46) | (N = 55) ie 
Mean | SD |Mean) SD 
Presents ma- | 2.0 | 1.1 | 3.0) 1.6 |12.8* 
terial clearly 
Displays inter- | 1.7 | 1.3 | 2.0) 1.0 | 2.0 
est in stu- 
dent 
Arouses inter- | 2.2 | 1.2 | 2.4) 1.2 | 0.8 
estin subject 
matter 
Makes learning | 2.0 | 1.2 | 2.4| 1.2 | 2.9 
active 
Knows mate- | 1.5 | 0.9 | 1.9| 1.0 | 3.8 
rial 
Over-all 9.5 | 4.2 |11.8| 4.6 | 6.9* 


Note.—The lower the rating, the more favorable. 
* Significant at the 1% level. 


rated him slightly poorer on all traits so 
that the over-all rating in the lecture class 
is significantly poorer than that Teceived 
in the sections. In both classes, however, 
the instructor was rated well above aver- 


PERCENTAGE oF STUDENTS IN Secr 
Comment on C 


TABLE 3 


IONS (S) AND LECTURE (L) Maxine Eacn 
OURSE AND INSTRUCTOR 


RUTH CHURCHILL AND PAULA JOHN 


age when compared with a sample of the © 
whole faculty. : 
Table 3 summarizes both the comments 
made by students on their ratings of the 
instructor and their answers to the three 
open-ended questions used to rate the 
course rather than the instructor. Stu- 
dents in both the small sections and the 
lecture class responded to the questions — 
on the most and least liked aspects of the 
course predominantly in terms of content. 
When attention is focused on the instructor 
rather than the course, content drops out 
as a relevant category. Other than this 
major difference, evoked by the different 
structuring of the two situations, the tw 
kinds of comments were similar. i 
The instructor’s presentation of the ma- 
terial was clearly the most important vari- | 
able present in both kinds of comments. 
Presentation was commented on more of- | 
ten favorably and less often unfavorably 
in the sections than in the lecture class 
significantly so in the case of unfavorable 
comments on the instructor. Another ma _ 
jor factor, mentioned only unfavorably, 
was the rapid pace of the course. Whe? 


On course (N = 485, 56L) On instructor (V = 46S, ssl) 
Comment Aspects liked— 
[ese ee . He 
Most Least improved? Favorable Unfavorable 
TA Fa a rap 
SJL S L S 
Content &@ jæ jso jso jas ja |o |o lat |i, 
Presentation 5o 39 jaz |21 faz |23 |78 |67 | ase 40", 
i 0 |23 e 
Laboratories 2** | 30** | o* a ioe ge a J ý 
Class size® 2 0 2 7 2** | 20** | 0 4 15 
Examinations 8 T 4 4 0 2 4 
Everything good —| 4 5 4 4 3 12 
Miscellaneous 12 8 4 2 |16 = 7 | 0 | 2 | 5 
*x? yielded difference significant at 5% level. F 
**x? yielded difference significant at 1% level. 
® Class size too large placed in unfavorable categories, 
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commenting on the instructor, the lecture 
. ¢lass made significantly more unfavorable 
comments on pace. The significantly 
greater number of unfavorable comments 
made by the lecture class on presentation 
and pace support the significantly poorer 
rating which they gave the instructor on 
clarity of presentation. 

The sections and the lecture class dis- 
agreed markedly about the laboratory, 
which was a favorable feature for the lec- 
ture group but not mentioned by the sec- 
tions. This difference can be accounted for 
In terms of differences in instructional pro- 
cedure: for the sections the laboratory 
| _ Was only another meeting with the instruc- 

tor while in the large class the small lab- 
Oratory groups, which met with the stu- 
dent assistant, were a distinct feature. 

In respect to the hypothesis relating to 
Student, attitudes towards the course the 
final position must be a qualified rejection 
Of the null hypothesis. The lecture class 
Was somewhat less satisfied, particularly 
M respect to clarity of presentation; but 
he lecture class commented favorably on 
the laboratory. The lower ratings of the 
Instructor on clarity of presentation may 

ave occurred because part of his function 
ad been taken over by the laboratory 
assistant, 
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SUMMARY 


ale The problem of whether or not fac- 
ulty time can be conserved through teach- 
ing in a large lecture class rather than in 
small sections and through replacing the 
instructor in the laboratory by an under- 
graduate student assistant was investi- 
gated by having the same instructor teach 
equated groups of typical students the 
same general education course in mathe- 
matics under two conditions: small lec- 
ture-discussion sections with a laboratory 
conducted by the instructor and a large 
lecture class with a laboratory conducted 
by a student assistant. 

2. On pre- and posttests on a test of 
relevant content and on an essay graded 
for understanding of mathematics, the two 
types of classes did not differ in amount 
of gain and both gained significantly and 
substantially. 

3. Although students in both types of 
courses rated the instructor and the course 
satisfactory, the lecture class was less sat- 
isfied than the sections. However, the 
comments in the lecture class indicated 
that the laboratory helped to meet student 
needs for discussion in which they could 
clarify the lecture for themselves. 


Received July 17, 1968. 
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TWEEN 

NSISTENT DISCREPANCIES BETWE 

A STUDY STRUCTOR GRADES AND TERM-END 
EXAMINATION GRADES 


ELDON G. KELLY 


North American Aviation, Inc., Canoga Park, California 


During the past decade, a number of 
investigations have been made to discover 
the relationship of factors other than in- 
telligence and aptitude scores to students 
level of achievement. In general, students 
have been selected for such studies on the 
basis of discrepancies between their pre- 
dicted level of achievement, as determined 
by aptitude tests and other criteria, and 
their actual level of achievement. 

Overachieving college students have 
been described in one report as likely to 
have “less fortunate backgrounds.” Fac- 
tors such as social enjoyment and prestige 
were generally found to have influenced 
the underachievers’ decision to attend col- 
lege (7). Other investigators have found 
certain personality factors, e.g., tendencies 
toward maladjustment (1), superego sta- 
tus (11), and overconformity (10), to 
have some influence on level of achieve- 
ment. Studies of the effects of remedial 
reading programs on achievement suggest 
that such programs result in improved 
achievement for some students (8, 9). 
Instructor grades appear to have been the 
only measure of achievement used in the 
above studies. 

The purpose of the investigation to be 
discussed here was to discover some of the 
factors responsible for differences in 
achievement as measured by instructors’ 
ratings and by common departmental 
term-end examinations. It appeared that 
some students in the Basie College Gen- 
eral Education Program at Michigan State 
University quite consistently received a 


*This study was part of a doctoral dis- 
sertation completed in 1956 under the direc- 


tion of Paul L. Dressel and Walter F. 
Johnson, Jr. 


higher grade on their term-end examina- 
tion than they received from their instruc- 
tors in the Basic College courses, while 
other students seemed to be equally con- 
sistent in getting the higher of the two 
grades from their instructors. 

The curriculum in the Basic College 
embodies four comprehensive areas: Com- 
munication Skills, Natural Seience, Social 
Science, and Humanities. Each of the 
basies consists of three courses taken in 
sequence, and both instructors and stu- 
dents are provided a common syllabus for 
each course. 

Students’ final grades in each of the 
Basic College courses are derived from in- 
structors’ ratings and performance on de- 
partmental term-end examinations, each 
of which counts 50 per cent in the deter- 
mination of the final grade? Prior to con- 
version to the final letter grade, both mr 
structor grades and term-end examination 
grades are assigned from a 15-point scale, 
with a score of one corresponding tO 5 
minus and a score of 15 corresponding $ 
A plus. For each student who completo 
the Basic College Program, there iS i 
record of 12 instructor grades, 12 sat” 
end examination grades, and 12 final Teil? 
grades. The coefficient of correlation i 
tween mean instructor grades and or 
term-end examination grades of all oe” 
College students is generally app?° 
mately .80, 


z re 
° Departmental term-end examinations the 
multiple-choice tests constructed bY pov” 
Basic College Evaluation Services, 2 e7 
instructional department which helps 50° 
velop, coordinate, and administer the Psy 
gram of examinations and evaluatio® ts 


conjunction with the various depart™ 
involved. 
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PROCEDURE 


Students whose instructor grades were 
generally higher than their term-end 
examination grades would appear to be 
characterized by traits which enhanced 
their performance in the structure of class- 
Toom activities and which commended 
them to their instructors. Thus, the hy- 
Pothesis was presented that students who 
generally received the higher grade from 
their instructors were more insecure, com- 
Pulsive, conforming, and rigid than stu- 
dents who generally received the higher 
grades on the term-end examinations. The 
Inventory of Beliefs test was used to test 
this hypothesis? This test consists of 120 
Statements with directions requesting the 
Student to respond to each item in terms 
of the following key: 1. strongly agree, 2. 
agree, 3, disagree, and 4. strongly disagree. 
Since all of the statements should elicit 
disagreement, low scores are obtained by 
individuals who are characterized in terms 
of the above hypothesis, with the opposite 

cing true of students obtaining high 
Scores (4). 

Basie College departmental term-end 
xaminations are multiple-choice tests 
Which are cumulative and increasingly 
Comprehensive from term to term. These 
tests require a considerable amount of 
Yeading during the examination period. 

hus, a second hypothesis presented was 
that the reading ability of students who 
Senerally received the higher grade from 
their instructors was inferior to that of 
their opposites. The Michigan State Uni- 
versity Reading Test, designed by mem- 

ers of the Basie College Evaluation 
Services, was used to test this hypothesis. 

his test yields a vocabulary score, 2 com- 
Drehension score, and a total score. 

Students whose performance on the 


° Developed by the Intercollege Com- 
Mittee on Attitudes, Values, and Personal 
p /ustment: The Cooperative Study of 
“valuation in General Education of the 
Merican Council on Education. 


term-end examinations was consistently 
short of expectations evolving from their 
instructor ratings might well learn to an- 
ticipate the term-end examination as a 
threatening, anxiety-producing experience. 
The Taylor Anxiety Scale was used to test 
an hypothesis presented with respect to 
this problem, with the unsurprising result 
that anxiety thus measured was shown to 
be unrelated to the problem. Beier has 
demonstrated, however, that induced anx- 
iety can impair certain aspects of intellec- 
tual functioning, resulting in impaired 
performance on tests.‘ 

The assumption underlying these hy- 
potheses was that differences in general 
scholastic aptitude and intelligence were 
not related to the phenomenon to be stud- 
ied. Nevertheless, it seemed inappropriate 
completely to disregard these factors, and 
comparisons on ACE scores were also 
made. The ACE, like the MSU Reading 
Test, is taken by all entering freshmen and 
it was the scores that the students in the 
study made on these tests at the time of 
enrollment which were used for the in- 
vestigation. 

Three groups of students were selected 
for the investigation of the problem: (a) 
students whose instructor grades were 
generally higher than their term-end ex- 
amination grades (higher instructor grade 
group); (b) students whose term-end ex- 
amination grades were generally the higher 
(higher examination grade group); and 
(c) students whose instructor grades and 
term-end examination grades were gener- 
ally about the same (nondeviant grade 
group). The latter group was selected for 
purposes of comparison with both of the 
two groups above to determine if these 
tivo extreme groups were different from a 
nondeviant grade group as well as from 


‘The ¢ technique was used in preference 
to the generally more appropriate analysis 
of variance for data of this type because of 
the investigator's interest in making pair- 
wise comparisons of the groups. 
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each other with respect to the factors 
studied. ; 
Statistical calculations for the selection 
of the above groups were based upon the 
instructor grades and term-end examina- 
tion grades obtained by populations of 
565 males and 469 females during their 
completion of the 12 Basic College courses. 
Other investigators have found that 
women tend to get higher grades than men 
from instructors, while men tend to get 
higher grades than women on standard 
achievement tests (2, 12). To avoid this 
bias, means of the accumulative sums of 
examination and instructor grades, mean 
differences between the accumulative 
sums, and standard deviations of the dif- 
ferences were computed separately for 
men and women. Men and women thus 
selected were jointly assigned to their ap- 
propriate groups. (Women received both 
higher instructor grades and higher ex- 
amination grades than men. While wom- 
en’s examination grades were only slightly 
higher than their instructor grades and 
only slightly higher than the men’s exami- 
nation grades, women’s instructor grades 
were substantially higher than the men’s.) 
In order to limit the study to extreme 
cases, only those men and women were 
selected whose differences between their 
summed examination grades and summed 
instructor grades Placed them at least two 
standard deviations beyond the mean dif- 
ference (E-I) between the accumulative 
sums of examination and instructor grades 
of the total male and female populations 
respectively. 
The above method of selection identified 
42 students as consistently obtaining 
higher grades from instructors and 54 as 
consistently obtaining higher grades on 
the term-end examination, Of these num- 
bers, 29 students in the higher instructor 
grade group (14 males and 15 females) 
and 32 students in the higher examination 
grade group (20 males and 12 females) 
cooperated throughout the study. The 
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nondeviant grade group was comprised of f 
32 students whose differences between in- 
structor grades and examination grades 
placed them within one third of one stand- 
ard deviation of the mean difference be- 
tween the accumulative sums of examina- 
tion and instructor grades. 

Members of the higher instructor grade 
group and higher examination grade group 
were interviewed prior to testing. Informa- 
tion gained from the interviews is dis- 
cussed below. 


REsuULTS 


After a brief, standard description of the 
problem, students in the higher instructor 
grade and higher examination grade | 
groups were asked if they knew which 
category described their performance. 
Only one interviewee (in the higher 1n- 
structor grade group) was unaware of the 
direction of her grades. Following the re- 
sponse to the above query, structuring of 
the interviews was restricted to the ques- 
tion: “How do you account for this?” 

In response to the aboye question, 15 
students in the higher instructor grade 
group expressed fear of the term-end ez 
amination; 11 students in the higher 12° 
structor grade group stated that too ofte? 
information required on the term-eD@ | 
examination did not correspond to wor 
covered in class; several labeled the €% 
amination “too ambiguous”; and som? 
complained that both the tests as a whole 
and individual items were too long, K f 
quiring too much reading in the time 
lotted for the test, e 

By contrast, the comments of 25 of m 
32 students in the higher examinat’o 
grade group were interpreted to indicat? | 
a lack of motivation for and indifferent 
toward the Basic College courses. Stude” p 
in the higher examination grade grova | 
generally saw the disparity between th 
examination and instructor grades oF sio i 
phenomenon of their own making, we 
their much more anxious opposites te?! 


a 

o see their circumstance as a rather 
threatening problem which had eluded 
Temedy. 

Mean grades presented in Table 1 be- 
low suggest, indeed, that students in the 
higher instructor grade group had some 
Teason to feel threatened by the term-end 
examination. Their mean examination 
grade, about D plus, was, however, what 
one might expect of this group. In finding 
a group with higher instructor grades, one 
might expect to find their examination 
| Brades below average. Conversely, higher 
examination grades would seem to be as- 
Sociated with below average instructor 
’ grades. Instead, we find the average in- 

Structor grade to be the same for the two 

Sroups (C plus), and the mean examina- 

tion grade of the higher examination 

8rade group considerably above average (B 

plus). The extremely high coefficients of 

Correlation between mean instructor grades 

and mean examination grades seen in 

Table 1 are artifacts of the selection 

method. This artifact of selection mani- 

fests itself each time both mean instructor 
and mean examination grades are com- 

Pared with a third variable. 


Tests of Significance of Differences 
In comparing the mean ACE scores, 
Mean reading scores, and mean Inventory 
Of Belief scores (see Table 2), considerable 
differences were found to exist between 
the higher instructor grade group and 
both of the other two groups. The mean 
ACE scores of both the higher examina- 
tion grade group and the nondeviant 
Stoup were significantly higher than the 
Mean ACE score of the higher instructor 
Bade group. The difference between the 
Mean ACE scores of the higher examina- 
tion grade group and the higher instructor 
Stade group was significant beyond the 
A 001 level of confidence, while the differ- 
nce between mean ACE scores of the 
higher instructor grade group and the non- 


INSTRUCTOR VERSUS EXAMINATION GRADES 


331 


TABLE 1 
Mean Instrucror AND MEAN EXAMINATION 
GRADEG, STANDARD DEVIATIONS, AND Co- 
EFFICIENTS OF CORRELATION BETWEEN 
Mean INSTRUCTOR AND MEAN EXAMINA- 
TION GRADES 


Deviate Groups men Meta r 
rade Grade 

Higher I 6.81 | 1.29) 9.15] 1.21) .95 
Grades 

Higher E 11.63 | 1.46] 9.06) 1.49) .93 
Grades 

Non-Deviate | 9.36 | 1.46) 9.44] 1.46) .98 
Grades 


deviant grade group was significant be- 
yond the .01 level of confidence. 

Superior reading ability set the group 
with the higher examination grades apart 
from the other two groups. The mean 
reading score of the higher examination 
grade group is significantly higher than 
that of the nondeviant grade group be- 
yond the .001 level of confidence, while 
the mean reading score of the latter group 
is significantly higher than that of the 
group with the higher instructor grades be- 
yond the .001 level of confidence. The 
yery small variance among the reading 
scores of the higher instructor grade group 
is one of the striking features of this group. 

Mean Inventory of Belief scores re- 
vealed no difference between the higher 
examination grade group and the non- 
deviant group, but, as the data in Table 2 
indicate, the mean Inventory of Belief 
scores of both these groups were found to 
be significantly higher than that of the 
higher instructor grade group beyond the 
.001 level of confidence. The group getting 
higher instructor grades was thus char- 
acterized as being more compulsive, in- 
secure, rigid, and conforming. 


Correlation Analysis 


Estimates of the relationship of stu- 
dents’ ACE scores, reading scores, and 


EACH OF THE THREE Groups 


Tests/All Groups Mean Variance 


PE 
Hi’r I Gr. Nandey. Hr 


ACE 


Higher E Grades 114.06 302.06 —4.54** 


igher I Grades 92.96 346.17 —2.97* 
FAA Grades 108.18 447.40 1.20 


MSU Reading Test 
Higher E Grades 57.97 154.10 


—7.82** 
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TABLE 2 

Mean Test Scores, VARIANCES, AND TESTS OF SIGNIFICANCE OF DIFFERENCES BETWEEN 
Higher I Grades 36.93 59.37 


—4.15** ve 
Nondev. Grades 47.81 144.34 3.30 


Inventory of Beliefs 

Higher E Grades 79.29 203.11 —3.30** 

Higher I Grades 66.17 304.49 —3.31"* 

Nondev. Grades 78.34 116.91 -30 

** Significant beyond the .001 level of confidence, 
TABLE 3 

CORRELATION COEFFICIENTS SHOWING RELATIONSHIP BETWEEN Srupents’ Test ScoRES 


AND MEAN EXAMINATION AND INSTRUCTOR GRADES 


This 


Variables Correlated Higher Higher Non- ftri 
E I 


deviate 
Grades Grades Grades 


ACE & X E Grades 


3 .7 E 

ACE & X I Grades g y a 
ACE Scores for All Grps and (E — 1) ` f 45 
Reading & X E Grades 
Reading & X I Grades ‘5 4 = 
Reading Scores for All Grps and (E — I) ` i -68 
IB & X E Grades 15 ne 
IB & X I Grades 2 z ‘ts 
IB Scores for All Groups and (E — I) ` g „52 

+ P . A 5 s . ins 
a b as S r relationships of test scores of all groups to differences between opat 


Inventory of Belief scores to their mean tained in estimating the relationship of 
instructor and mean examination grades ACE scores to mean examination er 
(see Table 3) resulted in coefficients of mean instructor grades for the higher s 
correlation which, in general, require little structor grade group, .77 and .86 respe 
comment. 


. o° 
; tively, are considerably greater than tb 
The magnitudes of the coefficients ob- 


or 
se 

z nd 
obtained for the other two groups * 


—e EE 
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greater, too, than customarily found. 
Within this group, apparently, both the ex- 
amination and the instructor ranked the 
students quite consistently in relation to 
ability. There are personal qualities at 
work, however, which seem to commend 
the student to the instructor, resulting in 
higher grades from instructors. 

The values obtained in estimating the 
Telationship of reading scores and Inven- 
tory of Belief scores to mean instructor 
and mean examination grades are similar 
to those values usually found in using these 
instruments. The Inventory of Beliefs 
typically yields a rather wide range of 
scores. 

Analysis of the relationship of the 
differences between students’ examination 
and instructor grades to test scores re- 
vealed rather substantial evidence that for 
these groups higher aptitude scores, read- 
ing scores, and Inventory of Belief scores 
were positively related to tendencies to 
get the higher grade on the term-end ex- 
amination. Jaspen’s formula for triserial 
correlation was used to determine the rela- 
tionship of difference between examination 
and instructor grades to test scores (5). 
A triserial coefficient of .45 was obtained 
in estimating the relationship of difference 
betiveen examination and instructor grades 
to ACE scores; a coefficient of .68 was ob- 
tained in correlating reading scores with 
these grade differences; and a coefficient 
of .52 was obtained in estimating the re- 
lationship of Inventory of Belief scores to 
differences between examination and in- 
Structor grades. 


DISCUSSION 


The mean instructor and examination 
Stades of the higher instructor grade group 
— in keeping with the investigator's 

Dectations, i.e., average grades from in- 

tructors and below average grades on the 

eo rend examination. Expectations of the 

nverse for the higher examination grade 
Stoup were not supported by the results of 
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the study. This group’s very high mean 
ACE score and reading score also suggest 
the unlikelihood of finding many of the 
expected variety in the group. 

Considered from the point of view of 
ability to achieve, the evidence suggests 
that the higher instructor grade group 
received higher grades from their instruc- 
tors than they should have, while the 
higher examination grade group received 
lower instructor grades than they should 
have. The superiority of the nondeviant 
grade group to the higher instructor grade 
group in general aptitude and reading 
ability also raises a question about the 
similarity of the instructor grades of these 
two groups. Again, the evidence seems to 
force the conclusion that students who 
were characterized as being more conform- 
ing, compulsive, rigid, and insecure re- 
ceived higher grades from their instructors 
than would be expected of them on the 
basis of ability alone. The information ob- 
tained in interviews suggests that the 
average instructor grades obtained by the 
higher examination group must be ex- 
plained in terms of a lack of motivation 
for and indifference toward the Basic 
College courses. 

No thought was given to determining 
which of these two groups, higher instruc- 
tor grade group or higher examination 
grade group, were the overachievers and 
which the underachievers. With respect to 
instructor grades, the higher examination 
grade group could be called under- 
achievers; but, in general, they did make 
high term-end examination grades, thus 
demonstrating a high degree of mastery of 
their subjects. Conversely, the higher in- 
structor grade group could be called over- 
achievers because their instructor grades 
appeared to be higher than warranted by 
their ability; but this group’s examination 
grades do not indicate mastery of the sub- 
ject, or overachievement. The results of 
this study suggest that what has often 
been called over or underachievement may 
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in some cases have been a function of the 
method of measuring achievement. In such 
cases a student’s grades might nët be an 
accurate description of his relative mastery 
of the subject. 


SUMMARY 


The purpose of this investigation was to 
discover some of the factors which differ- 
entiate students whose instructor grades 
were consistently higher than their grades 
on departmental term-end examinations 
from students who consistently got the 
higher grades on their departmental term- 
end examinations. Students who consist- 
ently received the higher grade from in- 
structors were found to receive average 
instructor grades and below average grades 
on the term-end examinations, while stu- 
dents who consistently received the higher 
grade on the term-end examinations were 
found to have superior examination grades 
and average instructor grades. Aptitude 
scores and reading scores of students who 
received the higher grades from instruc- 
tors were found to be significantly lower, 
beyond the .001 level of confidence, than 
those of their Opposites in achievement. 
The Inventory of Beliefs test character- 
ized the higher instructor grade group as- 
being more compulsive, conforming, rigid, 


and generally insecure than 


their oppo- 
sites. 
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